Design, develop, and optimize data pipelines using Python and AWS services such as Glue, Lambda, S3, EMR, Redshift, Athena, and Kinesis.
Implement ETL/ELT processes to extract, transform, and load data from various sources into centralized repositories (e.g., data lakes or data warehouses).
Collaborate with cross-functional teams to understand business requirements and translate them into scalable data solutions.
Monitor, troubleshoot, and enhance data workflows for performance and cost optimization.
Ensure data quality and consistency by implementing validation and governance practices.
Work on data security best practices in compliance with organizational policies and regulations.
Automate repetitive data engineering tasks using Python scripts and frameworks.
Leverage CI/CD pipelines for deployment of data workflows on AWS.
Required Skills:
Professional Experience: 5+ years of experience in data engineering or a related field.
Programming: Strong proficiency in Python, with experience in libraries like pandas, pyspark, or boto3.
AWS Expertise: Hands-on experience with core AWS services for data engineering, such as:
-AWS Glue for ETL/ELT.
-S3 for storage.
-Redshift or Athena for data warehousing and querying.
-Lambda for serverless compute.
-Kinesis or SNS/SQS for data streaming.
-IAM Roles for security.
Databases: Proficiency in SQL and experience with relational (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB) databases.
Data Processing: Knowledge of big data frameworks (e.g., Hadoop, Spark) is a plus.
DevOps: Familiarity with CI/CD pipelines and tools like Jenkins, Git, and CodePipeline.
Version Control: Proficient with Git-based workflows.
Problem Solving: Excellent analytical and debugging skills