Pyspark Data Engineer

Design, develop, and optimize robust ETL/ELT data pipelines using GCP tools like Dataproc, BigQuery, Cloud Composer, and Cloud Storage.
Write efficient PySpark and SQL code for processing large datasets.
Optimize complex SQL queries and data workflows to ensure high performance and scalability.
Collaborate with data scientists, analysts, and product teams to gather requirements and deliver data solutions.
Lead and participate in technical design discussions and architecture reviews.
Implement and maintain CI/CD pipelines and DevOps processes for data workflows.
Ensure best practices in data engineering, data quality, and data security.
(Preferred) Build real-time streaming pipelines using Cloud Pub/Sub and Cloud Functions.

7+ years of experience in Data Engineering roles.
4+ years of hands-on experience with GCP tools: BigQuery, Dataproc, Cloud Composer (Airflow), Cloud Storage.
Strong programming skills in PySpark and experience with data transformation frameworks.
Expertise in writing and tuning complex SQL queries.
Solid understanding of data modeling, data warehousing, and distributed data processing.
Hands-on experience with DevOps tools and processes in a data engineering context.
Proven ability to work independently and lead technical initiatives

Any Graduate

Back To Jobs