Responsibilities:
Data Pipeline Development:
§ Design, implement, and optimize end-to-end data pipelines on GCP, focusing on scalability and performance.
§ Develop and maintain ETL workflows for seamless data processing.
GCP Cloud Expertise:
§ Utilize GCP services such as BigQuery, Cloud Storage, and Dataflow for effective data engineering.
§ Implement and manage data storage solutions on GCP.
Data Transformation with PySpark:
§ Leverage PySpark for advanced data transformations, ensuring high-quality and well-structured output.
§ Implement data cleansing, enrichment, and validation processes using PySpark.
Requirements:
Skills:
GCP, PySpark
Any Graduate