- SQL skills to understand the transformation and perform analysis
- PySpark and MWAA DAGs skills to understand the data pipelines
- AWS Glue Data Catalog / IF skills to understand the access control
- Lakehouse concepts and databases to understand consumption patterns
- Architectural ownership
- Cross functional collaboration
- Strategic vision
- Develop and maintain data pipelines using Python and PySpark.
- Design and implement efficient SQL queries for data extraction and manipulation.
- Work with Aurora Postgres and Redshift for data storage and management.
- Utilize Informatica or other ETL tools for data integration and transformation.
- Implement and manage AWS cloud services, including Athena, S3, ECS/Docker, EMR, EC2, Lambda, CloudWatch, and EventBridge.
- Use Airflow or other orchestration tools for workflow management.
- Ensure data quality and integrity through rigorous testing and validation.
- Collaborate with other teams to understand data requirements and deliver solutions.
- Proficiency in Python and PySpark
- Solid experience in AWS Glue
- Strong SQL skills
- Experience with Aurora Postgres and Redshift
- Knowledge of Informatica or other ETL tools
- Experience with AWS cloud services, including Athena, S3, Lambda, CloudWatch, and EventBridge
- ECS/Docker (or any docker experience/concepts)
- EMR, EC2 (or knowledge of any other distributed computing)
- Familiarity with Airflow or other orchestration tools
Any Gradute