Description

Responsibilities:

Data Pipeline Development:

§  Design, implement, and optimize end-to-end data pipelines on GCP, focusing on scalability and performance.

§  Develop and maintain ETL workflows for seamless data processing.

GCP Cloud Expertise:

§  Utilize GCP services such as BigQuery, Cloud Storage, and Dataflow for effective data engineering.

§  Implement and manage data storage solutions on GCP.   

Data Transformation with PySpark:

§  Leverage PySpark for advanced data transformations, ensuring high-quality and well-structured output.

§  Implement data cleansing, enrichment, and validation processes using PySpark.


Requirements:

  • Proven experience as a Data Engineer, with a strong emphasis on GCP.
  • Proficiency in GCP services such as BigQuery, Cloud Storage, and Dataflow.
  • Expertise in PySpark for data processing and analytics is a must.
  • Experience with data modeling, ETL processes, and data warehousing.
  • Proficiency in programming languages such as Python, SQL, or Scala for data processing.
  • Relevant certifications in GCP or data engineering are plus.

Skills:

GCP, PySpark

Education

Any Graduate