Participate in Team activities, Design discussions, Stand up meetings and planning Review with team.
Assist and support Azure cloud migrations of existing data processes in classical infrastructure.
Data analysis, data profiling, data quality and data ingestion in various layers using big data/Hadoop/Hive/Impala queries, PySpark programs and UNIX shell scripts.
Collaborate with source system and approved provisioning point (APP) teams, Architects, Data Analysts and Modelers to build scalable and performant data solutions.
Follow the organization coding standard document, Create mappings, sessions and workflows as per the mapping specification document.
Analyze incoming data requests for team and determine appropriate target solutions.
Perform Gap and impact analysis of ETL and IOP jobs for the new requirement and enhancements.
Create jobs in Hadoop using SQOOP, PYSPARK and Stream Sets to meet the business user needs.
Create mockup data, perform Unit testing and capture the result sets against the jobs developed in lower environment.
Required Skills
Strong SQL experience (Oracle and Hadoop (Hive/Impala etc.)).
Good knowledge of Big Data, Hadoop, Hive, Impala database, data security and dimensional model design.
Familiar with Project Management methodologies like Waterfall and Agile.
Expertise with Hadoop, Spark and Hive implementations with strong programming experience in Hive, Java, Scala, Python, SQL.
Ability to establish priorities & follow through on projects, paying close attention to detail with minimal supervision.
Basic knowledge of UNIX/LINUX shell scripting.
Required Experience
5+ years of experience building data sets.
3+ years of experience with Expertise implementing complex ETL logic.
2+ years of Basic knowledge of UNIX/LINUX shell scripting.
Experience with both Relational and NoSQL databases, data modeling, database performance tuning.
Education Requirements
Bachelor’s Degree in Computer Science, Computer Engineering or a closely related field.