Job Description
Key Responsibilities:
- Design and implement robust, scalable, and efficient data pipelines using GCP services such as BigQuery, Dataflow, Pub/Sub, Cloud Storage, and more.
- Build and maintain data models and ETL processes to support data analytics and reporting needs.
- Optimize data storage and retrieval for performance and cost-effectiveness.
- Collaborate with data scientists, analysts, and other engineers to understand data requirements and deliver solutions.
- Monitor and troubleshoot production pipelines to ensure data quality and system reliability.
- Implement security best practices to ensure data integrity and compliance with regulations.
- Create and maintain comprehensive documentation for data workflows, architecture, and systems.
Qualifications:
Must-Have Skills:
- 5+ years of experience in data engineering or related fields.
- Proficiency with GCP services, including but not limited to BigQuery, Dataflow, Cloud Storage, Cloud Composer (Airflow), and Pub/Sub.
- Strong programming skills in Python and Java.
- Hands-on experience with SQL for querying and transforming data.
- Knowledge of data modeling, data warehousing, and building scalable ETL/ELT pipelines.
- Familiarity with CI/CD pipelines for deploying and managing data workflows.
- Solid understanding of distributed computing and cloud-based architecture.
Preferred Skills:
- Experience with other cloud platforms (AWS or Azure) is a plus.
- Knowledge of Spark, Hadoop, or other big data technologies.
- Familiarity with Kubernetes or containerised applications.
- Background in machine learning or data science is advantageous.
Soft Skills:
- Excellent problem-solving skills and a proactive approach to challenges.
- Strong communication skills to explain technical concepts to non-technical stakeholders.
- Ability to work in a fast-paced environment with changing priorities.
- Team-oriented mindset with a passion for knowledge sharing.