We are seeking an experienced ETL Ab Initio Developer with strong expertise in data integration, distributed computing, and cloud technologies.
The ideal candidate will have hands-on experience with Ab Initio, Apache Spark, Python, Java, and ANSI SQL, and will be responsible for building scalable data pipelines and managing complex workflows across hybrid environments.
Key Responsibilities
- Design and develop data integration solutions using Ab Initio ETL.
- Build efficient and scalable data processing pipelines using Apache Spark.
- Create, optimize, and maintain Directed Acyclic Graphs (DAGs) in Python for orchestrating data workflows.
- Implement, schedule, and monitor complex data workflows to ensure timely and accurate data processing.
- Diagnose and resolve issues within Airflow workflows, optimizing DAGs for performance and scalability.
- Collaborate with data engineering teams to continuously improve data pipelines and adapt to evolving business needs.
- Leverage AWS services such as S3, Athena, Glue, and EMR for data lifecycle management and purging.
- Integrate with internal archival platforms for efficient data retention and compliance.
- Support migration efforts from PySpark to AWS-based solutions.
- Implement agents for monitoring, logging, and automation within AWS environments.
- Maintain technical documentation including design specifications (HLD/LLD) and mapping documents.
Required Qualifications
- Strong hands-on experience with Ab Initio ETL development.
- Proficiency in Python, Java, and ANSI SQL.
- Experience with Apache Spark and distributed computing principles.
- Solid understanding of cloud technologies and AWS services (S3, Athena, Glue, EMR).
- Experience with Airflow for workflow orchestration and DAG optimization.
- Strong technical documentation skills (HLD, LLD, mapping specifications).
- Excellent communication skills and ability to collaborate with both technical and business teams.
Preferred Qualifications
- Experience with data migration from PySpark to AWS.
- Familiarity with monitoring and automation tools in cloud environments.
- Exposure to data governance and lifecycle management practices