Description

Key Responsibilities

  • Design, build, and maintain scalable data pipelines using Hadoop, Hive, PySpark, and Python
  • Integrate and manage data with Amazon S3, including object storage security and data service connectivity
  • Perform data modeling and database design (e.g., MySQL or equivalent)
  • Develop and manage job scheduling with Autosys
  • Implement data visualization and analysis using Power BI and Dremio
  • Automate tasks and workflows using Unix/Shell scripting and CI/CD pipelines
  • Apply business logic to transform, cleanse, and validate data for downstream use
  • Troubleshoot performance issues and optimize data processing
  • Collaborate with cross-functional teams and drive problem-solving initiatives

 

Required Skills

  • Minimum 4 years of hands-on experience with:
    • Hadoop, Hive, PySpark, Python
    • Amazon AWS S3
    • Autosys job scheduler
    • Data modeling and database design
    • Power BI, Dremio
    • Unix/shell scripting
    • CI/CD tools and pipeline integration

 

Preferred/Bonus Skills

  • Exposure to Google Cloud Platform (GCP) and cloud data engineering concepts
  • Prior experience in the financial services domain
  • Strong communication skills and ability to clearly articulate past technical work
  • Solid analytical and problem-solving abilities with a solution-driven mindset

 

Soft Skills

  • Proactive and self-directed: Ability to take ownership without waiting for direction
  • Accountable: Strong sense of responsibility and follow-through
  • Clear communicator: Must effectively convey technical work and thought processes

Education

Any Gradute