We are seeking a highly skilled Python Developer with strong PySpark expertise and hands-on experience in AWS cloud services to join our data engineering team. The ideal candidate will focus on designing and developing scalable data processing solutions using Apache Spark on the AWS platform. Java development is a secondary skill used for legacy system integration and occasional support.
Key Responsibilities:
Design, build, and optimize scalable ETL pipelines using PySpark
Work with large datasets to perform data transformation, cleansing, and aggregation
Develop and deploy data processing applications on AWS (e.g., EMR, S3, Lambda, Glue)
Develop reusable and efficient Python code following best practices
Collaborate with data engineers, data scientists, and product teams
Integrate data processing workflows with other services and systems, potentially using Java where needed
Monitor, troubleshoot, and improve performance of data jobs on distributed environments
Participate in code reviews and contribute to a culture of continuous improvement
Required Skills & Qualifications:
Any Gradute