Description

Job Description

Mandatory Skills: ETL , AWS glue, AWS, Python, Pyspark

Good to have: Airflow & databricks Job workflow, Amazon Web Services like VPC, S3, EC2, Redshift, RDS, EMR, Athena, IAM, Glue, DMS, Data pipeline & API, Lambda, etc.

 

Responsibilities:

 Design and implement the data modeling, data ingestion and data processing for
various datasets
 Design, develop and maintain ETL Framework for various new data source
 Develop data ingestion using AWS Glue/ EMR, data pipeline using PySpark, Python and
Databricks.
 Build orchestration workflow using Airflow & databricks Job workflow
 Develop and execute adhoc data ingestion to support business analytics.
 Proactively interact with vendors for any questions and report the status accordingly
 Explore and evaluate the tools/service to support business requirement
 Ability to learn to create a data-driven culture and impactful data strategies.
 Aptitude towards learning new technologies and solving complex problem.

  Qualifications:

 

 Minimum of bachelor’s degree. Preferably in Computer Science, Information system,

Information technology.
 Minimum 5 years of experience on cloud platforms such as AWS, Azure, GCP.
 Minimum 5 year of experience in Amazon Web Services like VPC, S3, EC2, Redshift, RDS,
EMR, Athena, IAM, Glue, DMS, Data pipeline & API, Lambda, etc.
 Minimum of 5 years of experience in ETL and data engineering using Python, AWS Glue,
AWS EMR /PySpark and Airflow for orchestration.
 Minimum 2 years of experience in Databricks including unity catalog, data engineering
Job workflow orchestration and dashboard generation based on business requirements
 Minimum 5 years of experience in SQL, Python, and source control such as Bitbucket,
CICD for code deployment.
 Experience in PostgreSQL, SQL Server, MySQL & Oracle databases.
 Experience in MPP such as AWS Redshift, AWS EMR, Databricks SQL warehouse &
compute cluster.
 Experience in distributed programming with Python, Unix Scripting, MPP, RDBMS
databases for data integration
 Experience building distributed high-performance systems using Spark/PySpark, AWS
Glue and developing applications for loading/streaming data into Databricks SQL
warehouse & Redshift.
 Experience in Agile methodology
 Proven skills to write technical specifications for data extraction and good quality code.
 Experience with big data processing techniques using Sqoop, Spark, hive is additional
plus
 Experience in data visualization tools including PowerBI, Tableau.
 Nice to have experience in UI using Python Flask framework and /or React/Angular JS.


 

Education

Any Graduate