Job Description
Job Summary: We are seeking a highly skilled Data Engineer with expertise in Python, PySpark, and AWS to design, build, and optimize scalable data pipelines and cloud-based data solutions. The ideal candidate will have strong experience in big data processing, ETL/ELT workflows, and cloud technologies. Key Responsibilities: Develop and maintain scalable, high-performance data pipelines using Python, PySpark, and AWS services. Implement ETL/ELT processes for ingesting, transforming, and processing large datasets. Optimize data workflows and Spark performance tuning for efficiency and cost-effectiveness. Work with AWS services such as S3, Glue, Lambda, EMR, Redshift, Athena, Step Functions, and Kinesis. Ensure data quality, integrity, and governance across the data platform. Collaborate with data analysts, data scientists, and business teams to understand data needs and build solutions. Implement CI/CD pipelines and automation for data infrastructure. Monitor and troubleshoot data pipelines and cloud infrastructure for reliability and performance. Stay updated with the latest trends in big data, cloud computing, and distributed processing
Any Graduate