Key Skills: Data Science, Pyspark
Roles and Responsibilities:
- Design, develop, and maintain data pipelines using Databricks and PySpark
- Implement and manage data solutions on cloud platforms such as Azure, AWS, or GCP
- Optimize SQL queries and manage Azure SQL Database, Synapse Analytics, and Azure Data Factory
- Collaborate with cross-functional teams to understand data requirements and deliver high-quality solutions
- Develop CI/CD pipelines using Azure DevOps to streamline deployment processes
- Ensure data integrity and accuracy through rigorous testing and validation
- Stay updated with industry trends and best practices in data engineering
Skills Required:
- Strong expertise in Data Science concepts and practices
- Hands-on experience in designing and building data pipelines
- Proficiency with Databricks for big data processing
- Solid understanding of cloud platforms (Azure, AWS, or GCP)
- Experience in SQL query optimization
- Familiarity with Azure SQL Database, Synapse Analytics, and Azure Data Factory
- Ability to lead and collaborate with cross-functional teams
- Experience with CI/CD pipelines using Azure DevOps
- Strong focus on data accuracy, testing, and validation
Nice-to-Have:
- Working knowledge of PySpark
- Awareness of current data engineering trends and best practices
Education: Bachelor's Degree in related field