- Data Pipeline Development:
- Design, build, and maintain scalable data pipelines and ETL processes to support analytics, reporting, and data science initiatives.
- Productionization & Automation:
- Develop and automate robust workflows for data ingestion, transformation, and delivery using orchestration tools and CI/CD pipelines.
- Cloud Data Engineering:
- Leverage cloud platforms (Azure preferred) and technologies (Databricks, Azure Data Factory) to manage big data environments and enable advanced analytics.
- Code Quality & Collaboration:
- Participate in code reviews (PRs), enforce coding standards, and collaborate with cross-functional teams to ensure maintainable, high-quality solutions.
- Monitoring & Troubleshooting:
- Implement monitoring and observability for data pipelines, proactively identifying and resolving issues in production environments.
- Data Governance & Quality:
- Ensure data integrity, security, and compliance by applying best practices in data governance, quality management, and documentation.
- Stakeholder Engagement:
- Work closely with business partners to understand requirements, deliver custom data solutions, and provide training and support for end users.
What You'll Need
Education:
- Bachelor's degree in Computer Science, Engineering, Information Systems, or a related field (Master's preferred).
Experience:
- 2-3 years of relevant experience, or a Master's degree in a related field.
- Strong proficiency with SQL, Python, and PySpark for data engineering tasks.
- Hands-on experience with cloud computing technologies for big data (Azure preferred), including Databricks and Azure Data Factory.
- Experience building and maintaining production-grade data pipelines, including monitoring and troubleshooting in live environments.
- Proficiency in using Git for version control and collaboration; experience participating in code reviews (PRs) and enforcing code quality standards.
- Experience working in an Agile environment, with knowledge of Agile methodologies and practices.
- Hands-on experience with CI/CD pipelines for automating data pipeline deployment and testing.
Technical Skills:
- Experience with orchestration tools (e.g., Airflow, dbt) and containerization (Docker, Kubernetes) for scalable data operations.
- Familiarity with automation tools and processes for streamlining development, testing, and deployment workflows.
- Understanding of DataOps concepts and best practices for reliable, reproducible, and scalable data workflows.
- Ability to develop dashboards and data visualizations in Power BI or similar tools.
- Basic understanding of data science concepts, techniques, and tools, such as machine learning algorithms, statistical analysis, and data preprocessing.
Business & Communication Skills:
- Exceptional communication skills, with the ability to translate technical solutions into business value.
- Ability to take vague requirements and convert them into scalable, actionable data solutions.
- Strong problem-solving skills, business curiosity, and results-driven mindset.
- Commitment to continuous learning and staying at the forefront of technology trends.
- Takes initiative and builds relationships with key stakeholders to establish confidence, trust, and credibility.
- Customer-focused and motivated by results, with the ability to proactively identify business needs and drive stakeholder satisfaction.
Preferred Qualifications
- Deep understanding and experience with Azure's cloud architecture, services, and management tools.
- Knowledge of MLOps best practices, including deployment, monitoring, and maintenance of machine learning models.
- Experience participating in code reviews and collaborative development processes.
- Experience with building automated pipelines for data workflow deployment and monitoring.
- Experience supporting data-driven business transformation and data governance initiatives