Description

• At least 4+ years of experience development of big data technologies/data pipelines 
• Experience in managing and manipulating huge datasets in the order of terabytes (TB) is essential. 
• Experience with big data technologies like Hadoop, Apache Spark (Scala preferred), Apache Hive, or similar frameworks on the cloud (GCP preferred, AWS, Azure, etc.) to build batch data pipelines with a strong focus on optimization, SLA adherence and fault tolerance. 
• Experience in building idempotent workflows using orchestrators like Automic, Airflow, Luigi, etc. 
• Experience in writing SQL to analyze, optimize, and profile data preferably in Big Query or SPARK SQL 
• Strong data modeling skills are necessary for designing a schema that can accommodate the evolution of data sources and facilitate seamless data joins across various datasets. 
• Ability to work directly with stakeholders to understand data requirements and translate that to pipeline development/data solution work. 
• Strong analytical and problem-solving skills are crucial for identifying and resolving issues that may arise during the data integration and schema evolution process. 
• The ability to move at a rapid pace with quality and start delivering with minimal ramp-up time will be crucial to succeed in this initiative. 
• Effective communication and collaboration skills are necessary for working in a team environment and coordinating efforts between different stakeholders involved in the project. 

Nice to have: 
• Experience building complex near real-time (NRT) streaming data pipelines using Apache Kafka, Spark streaming, and Kafka Connect with a strong focus on stability, scalability, and SLA adherence. 
• Good understanding of REST APIs – working knowledge of Apache Druid, Redis, Elastic Search, GraphQL, or similar technologies.  Understanding of API contracts, building telemetry, stress testing, etc. 
• Exposure to developing reports/dashboards using Looker/Tableau 
• Experience in eCommerce domain preferred

Education

Any Gradute