Description

We are seeking an experienced ETL Ab Initio Developer with strong expertise in data integration, distributed computing, and cloud technologies. 

The ideal candidate will have hands-on experience with Ab Initio, Apache Spark, Python, Java, and ANSI SQL, and will be responsible for building scalable data pipelines and managing complex workflows across hybrid environments.


 

Key Responsibilities

  • Design and develop data integration solutions using Ab Initio ETL.
  • Build efficient and scalable data processing pipelines using Apache Spark.
  • Create, optimize, and maintain Directed Acyclic Graphs (DAGs) in Python for orchestrating data workflows.
  • Implement, schedule, and monitor complex data workflows to ensure timely and accurate data processing.
  • Diagnose and resolve issues within Airflow workflows, optimizing DAGs for performance and scalability.
  • Collaborate with data engineering teams to continuously improve data pipelines and adapt to evolving business needs.
  • Leverage AWS services such as S3, Athena, Glue, and EMR for data lifecycle management and purging.
  • Integrate with internal archival platforms for efficient data retention and compliance.
  • Support migration efforts from PySpark to AWS-based solutions.
  • Implement agents for monitoring, logging, and automation within AWS environments.
  • Maintain technical documentation including design specifications (HLD/LLD) and mapping documents.


 

Required Qualifications

  • Strong hands-on experience with Ab Initio ETL development.
  • Proficiency in Python, Java, and ANSI SQL.
  • Experience with Apache Spark and distributed computing principles.
  • Solid understanding of cloud technologies and AWS services (S3, Athena, Glue, EMR).
  • Experience with Airflow for workflow orchestration and DAG optimization.
  • Strong technical documentation skills (HLD, LLD, mapping specifications).
  • Excellent communication skills and ability to collaborate with both technical and business teams.


 

Preferred Qualifications

  • Experience with data migration from PySpark to AWS.
  • Familiarity with monitoring and automation tools in cloud environments.
  • Exposure to data governance and lifecycle management practices

Education

Any Gradute