Responsibilities:
∙Develop and implement data pipelines for ingesting and collecting data from various sources into a
centralized data platform.
∙Develop and maintain ETL jobs using AWS Glue services to process and transform data at scale.
∙Optimize and troubleshoot AWS Glue jobs for performance and reliability.
∙Utilize Python and PySpark to efficiently handle large volumes of data during the ingestion process.
∙Collaborate with data architects to design and implement data models that support business
requirements.
∙Create and maintain ETL processes using Airflow, Python and PySpark to move and transform data
between different systems.
∙Implement monitoring solutions to track data pipeline performance and proactively identify and address
issues.
∙Manage and optimize databases, both SQL and NoSQL, to support data storage and retrieval needs.
∙Familiarity with Infrastructure as Code (IaC) tools like Terraform, AWS CDK and others.
∙Proficiency in event-driven integrations, batch-based and API-led data integrations.
∙Proficiency in CICD pipelines such as Azure DevOps, AWS pipelines or Github Actions.
Technical and Industry Experience:
∙Independent Integration Developer with over 5+ years of experience in developing and delivering
integration projects in an agile or waterfall-based project environment.
∙Proficiency in Python, PySpark and SQL programming language for data manipulation and pipeline
development
∙Hands-on experience with AWS Glue, Airflow, Dynamo DB, Redshift, S3 buckets, Event-Grid, and other
AWS services
∙Experience implementing CI/CD pipelines, including data testing practices.
∙Proficient in Swagger, JSON, XML, SOAP and REST based web service development
Bachelor's degree