Description

Key Responsibilities:

  • Design, develop, and optimize robust ETL/ELT data pipelines using GCP tools like Dataproc, BigQuery, Cloud Composer, and Cloud Storage.
  • Write efficient PySpark and SQL code for processing large datasets.
  • Optimize complex SQL queries and data workflows to ensure high performance and scalability.
  • Collaborate with data scientists, analysts, and product teams to gather requirements and deliver data solutions.
  • Lead and participate in technical design discussions and architecture reviews.
  • Implement and maintain CI/CD pipelines and DevOps processes for data workflows.
  • Ensure best practices in data engineering, data quality, and data security.
  • (Preferred) Build real-time streaming pipelines using Cloud Pub/Sub and Cloud Functions.

Required Skills & Qualifications:

  • 7+ years of experience in Data Engineering roles.
  • 4+ years of hands-on experience with GCP tools: BigQuery, Dataproc, Cloud Composer (Airflow), Cloud Storage.
  • Strong programming skills in PySpark and experience with data transformation frameworks.
  • Expertise in writing and tuning complex SQL queries.
  • Solid understanding of data modeling, data warehousing, and distributed data processing.
  • Hands-on experience with DevOps tools and processes in a data engineering context.
  • Proven ability to work independently and lead technical initiatives

Education

Any Graduate