Description

Experience with in big data technologies like Hadoop, Apache Spark (Scala preferred), Apache Hive, or similar frameworks on the cloud (GCP preferred, AWS, Azure etc.) to build batch data pipelines with strong focus on optimization, SLA adherence and fault tolerance. 
Experience in building idempotent workflows using orchestrators like Automic, Airflow, Luigi etc. 
Experience in writing SQL to analyze, optimize, profile data preferably in BigQuery or SPARK SQL 
Strong data modeling skills are necessary for designing a schema that can accommodate the evolution of data sources and facilitate seamless data joins across various datasets. 


Experience building complex near real time (NRT) streaming data pipelines using Apache Kafka, Spark streaming, Kafka Connect with a strong focus on stability, scalability and SLA adherence. 
Good understanding of REST APIs – working knowledge on Apache Druid, Redis, Elastic search, GraphQL or similar technologies.  Understanding of API contracts, building telemetry, stress testing etc

Education

Any Gradute