Job Description
- Responsible for Requirement gathering, analysis, Design the Data pipeline Architecture from source to target.
- Develop Spark applications using Pyspark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.
- Validate the source and target data and writing spark jobs using Transformations and Actions.
- Design NiFi workflow to pick up the data from source systems to Hadoop and Amazon S3.
- Develop Kafka producer and consumer for different publishing and subscribing to Kafka topics.
- Develop Spark-SQL queries to Load JSON data and create Schema and loaded it into Hive Tables and handled structured data using Spark SQL.
- Analyse datasets, performed logical analysis operations to deep dive into data, debug data quality, cleanse and transform data and create reports to share findings across the teams.
Skills
- Hadoop ( PySpark, MapReduce, Hive), Shell scripting, Sqoop, Oozie, Zookeeper,Snowflake,Databricks,
Education: Looking some one with Bachelors or Equivalent in Computer Science.