GCP & PySpark with ETL - Lead

UST
Thiruvananthapuram, Kerala, India

Description

OVERVIEW

Key Responsibilities

Design, develop, and optimize ETL pipelines using PySpark on Google Cloud Platform (GCP).
Work with BigQuery, Cloud Dataflow, Cloud Composer (Apache Airflow), and Cloud Storage for data transformation and orchestration.
Develop and optimize Spark-based ETL processes for large-scale data processing.
Implement best practices for data governance, security, and monitoring in a cloud environment.
Collaborate with data engineers, analysts, and business stakeholders to understand data requirements.
Troubleshoot performance bottlenecks and optimize Spark jobs for efficient execution.
Automate data workflows using Apache Airflow or Cloud Composer.
Ensure data quality, validation, and consistency across pipelines.
5+ years of experience in ETL development with a focus on PySpark.
Strong hands-on experience with Google Cloud Platform (GCP) services, including:
BigQuery
Cloud Dataflow / Apache Beam
Cloud Composer (Apache Airflow)
Cloud Storage
Proficiency in Python and PySpark for big data processing.
Experience with data lake architectures and data warehousing concepts.
Knowledge of SQL for data querying and transformation.
Experience with CI/CD pipelines for data pipeline automation.
Strong debugging and problem-solving skills.
Experience with Kafka or Pub/Sub for real-time data processing.
Knowledge of Terraform for infrastructure automation on GCP.
Experience with containerization (Docker, Kubernetes).
Familiarity with DevOps and monitoring tools like Prometheus, Stackdriver, or Datadog.

Skills:

Gcp, Pyspark, Etl

Key Skills

Gcp Pyspark Etl Ci/cd Pipelines Python Sql Terraform Docker Kubernetes

Education

Any Graduate

Apply Now

Back To Jobs

Posted On: Today
Experience: 6+ years of experience
Openings: 2
Category: GCP Engineer
Tenure: Flexible Position