Description

Key Responsibilities: 

 

  • Design, develop, and optimize data pipelines using Python and AWS services such as Glue, Lambda, S3, EMR, Redshift, Athena, and Kinesis.
  • Implement ETL/ELT processes to extract, transform, and load data from various sources into centralized repositories (e.g., data lakes or data warehouses).
  • Collaborate with cross-functional teams to understand business requirements and translate them into scalable data solutions.
  • Monitor, troubleshoot, and enhance data workflows for performance and cost optimization.
  • Ensure data quality and consistency by implementing validation and governance practices.
  • Work on data security best practices in compliance with organizational policies and regulations.
  • Automate repetitive data engineering tasks using Python scripts and frameworks.
  • Leverage CI/CD pipelines for deployment of data workflows on AWS. 


     

Requires Skills:

 

  • Professional Experience: 5+ years of experience in data engineering or a related field.
  • Programming: Strong proficiency in Python, with experience in libraries like pandas, pyspark, or boto3.
  • AWS Expertise: Hands-on experience with core AWS services for data engineering, such as:
  • AWS Glue for ETL/ELT.
  • S3 for storage.
  • Redshift or Athena for data warehousing and querying.
  • Lambda for serverless compute .
  • Kinesis or SNS/SQS for data streaming.
  • IAM Roles for security.
  • Databases: Proficiency in SQL and experience with relational (e.g., PostgreSQL, MySQL) and NoSQL (e.g., DynamoDB) databases.
  • Data Processing: Knowledge of big data frameworks (e.g., Hadoop, Spark) is a plus.
  • DevOps: Familiarity with CI/CD pipelines and tools like Jenkins, Git, and CodePipeline .
  • Version Control: Proficient with Git-based workflows.
  • Problem Solving: Excellent analytical and debugging skills

Education

Any Gradute