Job Description
Must have: Python, Pyspark , AWS And GCP
Job Description:
We are seeking a skilled Data Engineer to join our team and help design, develop, and optimize data pipelines. The ideal candidate will have strong expertise in Python, PySpark, AWS, and GCP to build scalable and efficient data solutions.
Key Responsibilities:
- Design, build, and maintain scalable ETL/ELT pipelines using PySpark and Python.
- Develop and optimize big data processing workflows on AWS and GCP.
- Work with structured and unstructured data to ensure high data quality and availability.
- Implement data transformation, aggregation, and cleansing processes.
- Develop and manage data lake, data warehouse, and cloud-based storage solutions.
- Optimize performance and troubleshoot issues in distributed computing environments.
- Collaborate with data scientists, analysts, and software engineers to support analytical and machine learning workloads.
- Implement best practices for data governance, security, and compliance.
Preferred Qualifications:
- Experience with CI/CD pipelines for data workflows.
- Knowledge of streaming technologies (Kafka, Spark Streaming, Pub/Sub).
- Familiarity with Terraform or CloudFormation for infrastructure automation.