Job Description
Mandatory Skills: ETL , AWS glue, AWS, Python, Pyspark
Good to have: Airflow & databricks Job workflow, Amazon Web Services like VPC, S3, EC2, Redshift, RDS, EMR, Athena, IAM, Glue, DMS, Data pipeline & API, Lambda, etc.
Responsibilities:
Design and implement the data modeling, data ingestion and data processing for
various datasets
Design, develop and maintain ETL Framework for various new data source
Develop data ingestion using AWS Glue/ EMR, data pipeline using PySpark, Python and
Databricks.
Build orchestration workflow using Airflow & databricks Job workflow
Develop and execute adhoc data ingestion to support business analytics.
Proactively interact with vendors for any questions and report the status accordingly
Explore and evaluate the tools/service to support business requirement
Ability to learn to create a data-driven culture and impactful data strategies.
Aptitude towards learning new technologies and solving complex problem.
Qualifications:
Minimum of bachelor’s degree. Preferably in Computer Science, Information system,
Information technology.
Minimum 5 years of experience on cloud platforms such as AWS, Azure, GCP.
Minimum 5 year of experience in Amazon Web Services like VPC, S3, EC2, Redshift, RDS,
EMR, Athena, IAM, Glue, DMS, Data pipeline & API, Lambda, etc.
Minimum of 5 years of experience in ETL and data engineering using Python, AWS Glue,
AWS EMR /PySpark and Airflow for orchestration.
Minimum 2 years of experience in Databricks including unity catalog, data engineering
Job workflow orchestration and dashboard generation based on business requirements
Minimum 5 years of experience in SQL, Python, and source control such as Bitbucket,
CICD for code deployment.
Experience in PostgreSQL, SQL Server, MySQL & Oracle databases.
Experience in MPP such as AWS Redshift, AWS EMR, Databricks SQL warehouse &
compute cluster.
Experience in distributed programming with Python, Unix Scripting, MPP, RDBMS
databases for data integration
Experience building distributed high-performance systems using Spark/PySpark, AWS
Glue and developing applications for loading/streaming data into Databricks SQL
warehouse & Redshift.
Experience in Agile methodology
Proven skills to write technical specifications for data extraction and good quality code.
Experience with big data processing techniques using Sqoop, Spark, hive is additional
plus
Experience in data visualization tools including PowerBI, Tableau.
Nice to have experience in UI using Python Flask framework and /or React/Angular JS.
Any Graduate