Job Description
> 6+ years of experience with machine learning pipelines and MLOps tools.
> Knowledge of data governance frameworks and tools (e.g., Apache Atlas).
> Familiarity with large-scale stream processing systems such as Apache Flink or Samza.
> Prior experience with real-time data architectures.
> Programming Languages: Proficiency in Python, Java, or Scala.
> Data Storage: Experience with SQL databases (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra).
> Cloud Platforms: Strong experience with AWS, Azure, or Google Cloud Platform (GCP), with a focus on cloud-native data solutions (e.g., S3, Redshift, BigQuery, or Azure Data Lake).
> Data Warehousing: Hands-on experience with modern data warehousing solutions like Snowflake, Amazon Redshift, Google BigQuery, or Azure Synapse.
> Big Data Technologies: Proficiency in Hadoop, Spark, and Kafka for processing large-scale datasets.
> ETL Tools: Experience with ETL tools such as Apache NiFi, Airflow, Talend, or Informatica.
> Containerization & Orchestration: Familiarity with Docker, Kubernetes, and cloud-based orchestration tools.
> Version Control: Experience with Git, GitHub, or similar version control systems.
> DevOps & CI/CD: Experience with continuous integration and deployment (CI/CD) practices and tools such as Jenkins or CircleCI.
> Data Modeling: Expertise in designing and building relational and dimensional data models.
Job Responsibilities
> Design, build, and maintain scalable data pipelines to support business analytics and machine learning workflows.
> Develop and manage robust ETL processes to collect, store, and process large datasets from multiple sources.
> Ensure the performance, quality, and security of data platforms.
> Collaborate with data scientists and business teams to understand data needs and provide data infrastructure solutions.
> Optimize and improve existing data systems for scalability and efficiency.
> Implement data governance policies, ensuring data integrity and compliance with regulatory standards.
> Monitor and troubleshoot data pipelines, ensuring data quality and resolving any issues proactively.
> Stay updated on the latest technologies and best practices in data engineering.
Any Graduate