Roles and responsibilities:
- Lead in design, development and testing of data ingestion pipelines, perform end to end validation of ETL process for various datasets that are being ingested into the big data platform.
- Perform data migration and conversion validation activities on different applications and platforms.
- Provide the technical leadership on data profiling/analysis, discovery, analysis, suitability and coverage of data, and identify the various data types, formats, and data quality issues which exist within a given data source.
- Contribute to development of transformation logic, interfaces and reports as needed to meet project requirements.
- Participate in discussion for technical architecture, data modeling and ETL standards, collaborate with Product Managers, Architects and Senior Developers to establish the physical application framework (e.g. libraries, modules, execution environments)
- Lead in design and develop validation framework and integrated automated test suites to validate end to end data pipeline flow, data transformation rules, and data integrity.
- Develop tools to measure the data quality and visualize the anomaly pattern in source and processed data.
- Assist Manager in project planning, validation strategy development
- Provide support in User acceptance testing and production validation activities.
- Provide technical recommendations for identifying data validation tools, recommend new technologies to improve the validation process.
- Evaluate existing methodologies and processes and recommend improvements.
- Work with the stakeholders, Product Management, Data and Design, Architecture teams and executives to call out issues, guide and contribute to the resolutions discussions.
Must have :
- 8+ years of Software development and testing experience.
- 4+ years of Working experience on tools like Spark, HBase, Hive, Sqoop, Impala, Kafka, Flume, Oozie, MapReduce, etc.
- 4+ years of programming experience in Scala, Java or Python
- Experience in technical leading and mentoring the teams
- Experience with developing and testing ETL, real-time data-processing and Analytics Application Systems.
- Strong knowledge in Spark SQL, Scala code development in big data Hadoop environment and/or BI/DW development experiences.
- Strong knowledge in shell scripting Experience in Web Services - API development and testing.
- Experience with development and automated framework in a CI/CD environment.
- Experience with cloud environments - AWS or GCP is a plus.
- Knowledge of GIT/Jenkins and pipeline automation is a must.
- A solid understanding of common software development practices and tools.
- Strong analytical skills with a methodical approach to problem solving applied to the Big Data domain
- Good organizational skills and strong written and verbal communication skills.
Nice to have :
- Working experience on large migration Projects is a big plus.
- Working experience on Google Cloud platform is a big plus
- Development experience for tools and utilities for monitoring and alert set etc.
- Familiarity with project Management and bug tracking tools, i.e., JIRA or a similar tool