GCP with PySpark - Lead

Responsibilities:

Data Pipeline Development:

§ Design, implement, and optimize end-to-end data pipelines on GCP, focusing on scalability and performance.

§ Develop and maintain ETL workflows for seamless data processing.

GCP Cloud Expertise:

§ Utilize GCP services such as BigQuery, Cloud Storage, and Dataflow for effective data engineering.

§ Implement and manage data storage solutions on GCP.

Data Transformation with PySpark:

§ Leverage PySpark for advanced data transformations, ensuring high-quality and well-structured output.

§ Implement data cleansing, enrichment, and validation processes using PySpark.

Requirements:

Proven experience as a Data Engineer, with a strong emphasis on GCP.
Proficiency in GCP services such as BigQuery, Cloud Storage, and Dataflow.
Expertise in PySpark for data processing and analytics is a must.
Experience with data modeling, ETL processes, and data warehousing.
Proficiency in programming languages such as Python, SQL, or Scala for data processing.
Relevant certifications in GCP or data engineering are plus.

Skills:

GCP, PySpark

Any Graduate

Back To Jobs