Description

Exp level: 10+ years

Skills (EXPERT/ADVANCED/NONE)
Familiar with integrating data streaming platforms like Kafka for real-time data ingestion and processing, ensuring scalability and fault tolerance.
8+ years of hands-on experience in the Hadoop ecosystem with expertise in PySpark for data processing and transformation.
Proficient in writing and optimizing complex Hive and Spark queries for efficient data retrieval and manipulation.
Strong experience handling large volumes of data, particularly in the banking domain, with a focus on data governance, security, and compliance.
Solid working knowledge of Linux commands for file handling, process management, and system troubleshooting.
Experience working with Cloudera distribution, including cluster management, troubleshooting, and performance tuning, along with expertise in YARN 
Hands-on experience with Oozie for scheduling and managing complex ETL workflows and job orchestration.
Proven ability to manage real-time data processing pipelines and large-scale financial data for critical business applications.
In-depth experience with Apache Spark distribution (e.g., Spark on YARN, Spark SQL) for high-performance distributed data processing, tuning, and optimization
Basic understanding of, with Hadoop administration tasks, including cluster monitoring, tuning, and troubleshooting familiarity in Ambari for cluster 
 

Education

Any Graduate