Description

Key Responsibilities:

  • Design and implement scalable ETL/ELT pipelines using AWS Glue, Spark (PySpark), and Step Functions.
  • Work with structured and semi-structured data using Athena, S3, and Lake Formation to enable efficient querying and access control.
  • Develop and deploy serverless data processing solutions using AWS Lambda and integrate them into pipeline orchestration.
  • Perform advanced SQL and PL/SQL development for data transformation, analysis, and performance tuning. Build data lakes and data warehouses using S3, Aurora, and Athena.
  • Implement data governance, security, and access control strategies using AWS tools including Lake Formation, CloudFront, EBS/EFS, and IAM. Develop and maintain metadata, lineage, and data cataloging capabilities.
  • Participate in data modeling exercises for both OLTP and OLAP environments.
  • Work closely with data scientists, analysts, and business stakeholders to understand data requirements and deliver actionable insights. Monitor, debug, and optimize data pipelines for reliability and performance.

Required Skills & Experience:

  • Strong experience with AWS data services: Glue, Athena, Step Functions, Lambda, Lake Formation, S3, EC2, Aurora, EBS/EFS, CloudFront.
  • Proficient in PySpark, Python, SQL (basic and advanced), and PL/SQL.
  • Solid understanding of ETL/ELT processes and data warehousing concepts.
  • Familiarity with modern data platform fundamentals and distributed data processing.
  • Experience in data modeling (conceptual, logical, physical) for analytical and operational use cases.
  • Experience with orchestration and workflow management tools within AWS.
  • Strong debugging and performance tuning skills across the data stack

Education

Any Gradute