Description

- SQL skills to understand the transformation and perform analysis

- PySpark and MWAA DAGs skills to understand the data pipelines

- AWS Glue Data Catalog / IF skills to understand the access control

- Lakehouse concepts and databases to understand consumption patterns

- Architectural ownership

- Cross functional collaboration

- Strategic vision

- Develop and maintain data pipelines using Python and PySpark.

- Design and implement efficient SQL queries for data extraction and manipulation.

- Work with Aurora Postgres and Redshift for data storage and management.

- Utilize Informatica or other ETL tools for data integration and transformation.

- Implement and manage AWS cloud services, including Athena, S3, ECS/Docker, EMR, EC2,     Lambda, CloudWatch, and EventBridge.

- Use Airflow or other orchestration tools for workflow management.

- Ensure data quality and integrity through rigorous testing and validation.

- Collaborate with other teams to understand data requirements and deliver solutions.

- Proficiency in Python and PySpark

- Solid experience in AWS Glue

- Strong SQL skills

- Experience with Aurora Postgres and Redshift

- Knowledge of Informatica or other ETL tools

- Experience with AWS cloud services, including Athena, S3, Lambda, CloudWatch, and EventBridge


- ECS/Docker (or any docker experience/concepts)

- EMR, EC2 (or knowledge of any other distributed computing)

- Familiarity with Airflow or other orchestration tools

Education

Any Gradute