Description

Key Skills: Must-Have: Azure, Data Modelling, Data Engineering, MS SQL, Python, Airflow.
Nice-to-Have: AWS.

Roles & Responsibilities:

  • Design, develop, and maintain scalable and efficient data pipelines using tools like Apache NiFi, Apache Airflow, or similar.
  • Develop robust Python scripts for data ingestion, transformation, and validation.
  • Manage and optimize object storage systems (e.g., Amazon S3, Azure Blob, Google Cloud Storage).
  • Collaborate with Data Scientists and Analysts to understand data requirements and deliver high-quality, production-ready datasets.
  • Implement data quality checks, monitoring, and alerting mechanisms to ensure data accuracy and reliability.
  • Ensure data security, governance, and compliance with industry standards, including GDPR, HIPAA, etc.
  • Contribute to the architecture and design of data platforms and solutions, ensuring scalability and reliability.
  • Design and implement ETL processes that align with business needs and deliver insights efficiently.
  • Work closely with cross-functional teams to integrate data solutions into the broader ecosystem.
  • Perform data profiling and assess data quality using statistical methods.
  • Automate manual processes to improve efficiency and streamline data workflows.
  • Optimize performance of data pipelines to handle large volumes of data with minimal latency.
  • Mentor junior engineers and promote best practices in data engineering, including code reviews, version control, and testing.
  • Troubleshoot and resolve data-related issues, ensuring minimal downtime and high availability.

Experience Required:

  • 7-10 years of experience with building data lakes, data warehouses, and real-time streaming data solutions using cloud technologies (Azure mandatory; AWS a plus).
  • Proven ability to work with large-scale structured and unstructured data using tools such as Apache NiFi, Apache Airflow, and Spark.
  • Practical experience designing and implementing data models optimized for analytics and reporting using tools like Azure Synapse, Snowflake, or Redshift.
  • Expertise in optimizing database queries and managing relational databases like MS SQL Server, PostgreSQL, or MySQL.
  • Experience integrating third-party APIs and systems into data pipelines.
  • Ability to perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
  • Experience in developing CI/CD pipelines for data workflows and familiarity with DevOps practices.
  • Knowledge of data lineage, cataloging, and metadata management tools such as Azure Purview, Collibra, or Alation.
  • Prior experience in supporting data governance initiatives, including implementing row-level security, masking, and anonymization techniques.
  • Exposure to modern data orchestration and version control systems (e.g., Git, Bitbucket, Jenkins).
  • Demonstrated success in working within Agile teams, participating in sprint planning, backlog grooming, and cross-functional collaboration.
  • Hands-on experience with containerization technologies like Docker and orchestration tools such as Kubernetes for deploying scalable data services.
  • Experience with real-time data streaming frameworks like Kafka or Azure Event Hub is an added advantage.
  • Familiarity with monitoring tools (e.g., Grafana, Prometheus, Azure Monitor) to ensure data pipeline health and performance.
  • Education:  B.Tech M.Tech (Dual), B.E., B.Tech

Education

Any Graduate