Position Summary:
We are seeking a skilled Mid-Level Data Engineer with a strong background in designing and managing large, data-heavy distributed systems. The ideal candidate will have experience with GCP (Google Cloud Platform), Python, SQL, and various data technologies, and be adept at pipeline orchestration and cloud-based data solutions. If you are passionate about working with large datasets, optimizing performance, and deploying scalable solutions, we want to hear from you!
Key Responsibilities:
Design and Development:
Design, develop, and maintain large-scale data processing systems using Google Cloud Platform (GCP) technologies.
Data Integration and Management:
Work with BigQuery for large-scale data querying and analysis.
Manage data integration processes using Dataflow for real-time and batch data processing.
Interface with APIs (REST/GraphQL) to integrate and synchronize data across systems.
Coding and Development:
Apply strong Python coding skills to develop data processing scripts and automation.
Performance Optimization:
Optimize data processing and querying performance to ensure efficiency and scalability.
Collaboration and Communication:
Collaborate with cross-functional teams to understand data needs and requirements.
Required Qualifications:
Bachelor’s Degree in Computer Science or a related field, or equivalent experience.
4+ years of experience as a software engineer or in a similar role, with a focus on designing and managing data-heavy distributed systems or high-traffic web applications.
Minimum of 2 years of strong Python coding experience, including developing data processing and automation scripts.
Proficiency in SQL, with a proven track record of writing efficient and performant queries.
Hands-on experience with Google Cloud Platform (GCP), including BigQuery and Dataflow.
Experience with pipeline orchestration tools, such as Apache Airflow.
Familiarity with Kafka for data streaming and messaging.
Proven ability to work in a remote software development environment and effectively use version control systems (Git strongly preferred).
Strong analytical and problem-solving skills, with the ability to work independently in a fast-paced, dynamic environment.
Preferred Qualifications:
Experience with big data tools such as Spark or PySpark.
Familiarity with MLOps systems/tooling, such as MLFlow, for managing machine learning workflows.
Experience with Infrastructure as Code (IaC) frameworks, particularly Terraform.
Any Graduate