Design, build, and maintain scalable, secure, and efficient data pipelines using AWS services such as Glue, Lambda, Step Functions, S3, Redshift, EMR, and Data Pipeline.
Develop robust Python scripts for data ingestion, transformation, and automation.
Write and optimize complex SQL queries for ETL and analytics workflows.
Operate in Unix/Linux environments for scripting, automation, and system-level data operations.
Participate in Agile ceremonies (daily stand-ups, sprint planning, retrospectives) and contribute to iterative delivery of data solutions.
Collaborate with cross-functional teams to gather requirements and translate them into high-level architecture and design documents.
Communicate technical concepts clearly through documentation, presentations, and stakeholder meetings.
Implement monitoring, logging, and alerting for data pipelines to ensure reliability and performance.
Apply DevOps best practices using GitHub, Terraform, and CloudFormation for infrastructure automation and CI/CD.
Required Qualifications:
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
5+ years of experience in data engineering or a similar role.
Strong hands-on experience with AWS data services (e.g., EMR, Glue, Lambda, Step Functions, S3, Redshift).
Advanced proficiency in Python for scripting and automation.
Solid experience with Unix/Linux shell scripting.
Strong command of SQL and experience with relational databases.
Proficiency with GitHub for version control and collaboration.
Experience with Terraform and/or AWS CloudFormation for infrastructure-as-code.
Experience working in Agile/Scrum environments.
Excellent verbal and written communication skills.
Proven ability to contribute to high-level solution design and architecture discussions.
AWS Certification (e.g., AWS Certified Data Analytics – Specialty, AWS Certified Solutions Architect, or equivalent)