Description

Description:

 

  • GCP Data Engineer will create, deliver, and support custom data products, as well as enhance/expand team capabilities. They will work on analyzing and manipulating large datasets supporting the enterprise by activating data assets to support enabling platforms and analytics. Google Cloud Data Engineers will be responsible for designing the transformation and modernization on Google Cloud Platform using GCP Services.

 

Responsibilities:

 

  • Build data systems and pipelines on GCP Cloud using Data Proc, Data Flow, Data Fusion, BigQuery, and Pub/Sub.
  • Implement schedules/workflows and tasks for Cloud Composer / Apache Airflow.
  • Create and manage data storage solutions using GCP services such as BigQuery, Cloud Storage, and Cloud SQL.
  • Monitor and troubleshoot data pipelines and storage solutions using GCP's Stackdriver and Cloud Monitoring.
  • Develop efficient ETL/ELT pipelines and orchestration using Data Prep, Google Cloud Composer.
  • Develop and maintain data ingestion and transformation process using Apache PySpark, Dataflow.
  • Automate data processing tasks using scripting languages such as Python or Bash.
  • Ensure data security and compliance with industry standards by configuring IAM roles, service accounts, and access policies.
  • Automate cloud deployments and infrastructure management using Infrastructure as Code (IaC) tools such as Terraform or Google Cloud Deployment Manager.
  • Participate in code reviews, contribute to development best practices and usage of Developer Assist tools to create robust, fail-safe data pipelines.
  • Collaborate with Product Owners, Scrum Masters, and Data Analysts to deliver the user stories and tasks and ensure deployment of pipelines.

 

Experience Required:

 

  • 5+ years of application development experience using one of the core cloud platforms (AWS, Azure, GCP).
  • Minimum 1+ years of GCP experience, especially in GCP-based Big Data deployments (Batch/Real-Time) leveraging BigQuery, BigTable, Google Cloud Storage, Pub/Sub, Data Fusion, Dataflow, Dataproc, Airflow, Cloud Composer.
  • 2+ years coding skills in Java/Python/PySpark and strong proficiency in SQL.
  • Work with data teams to analyze data, build models, and integrate massive datasets from multiple sources for data modeling.
  • Experience in extracting, loading, transforming, cleaning, and validating data. Designing pipelines and architectures for data processing.
  • Architecting and implementing next-generation data and analytics platforms on GCP Cloud.
  • Experience in working with Agile and Lean methodologies.
  • Experience working with either a MapReduce or an MPP system on any size/scale.
  • Experience working in CI/CD model to ensure automated orchestration of pipelines

Education

Any Graduate