Description

Job Description:
-Design and develop robust big data pipelines using Hadoop and PySpark.
-Write and optimize complex queries using Hive and SQL for large-scale data processing.
-Collaborate with data analysts, scientists, and business teams to gather requirements and deliver effective data solutions.
-Monitor and troubleshoot data workflows and pipelines for performance and reliability.
-Ensure data integrity and consistency across all stages of the data lifecycle.
Implement best practices for data governance, security, and compliance

Must-Have Skills:
-Strong hands-on experience with the Hadoop ecosystem (HDFS, YARN, MapReduce).
-Expertise in Hive for querying large datasets.
-Proficiency in PySpark for distributed data processing

Education

Any Graduate