Job Description :
Roles & Responsibilities:
Create and maintain optimal data pipeline architecture
Build data pipelines that transform raw, unstructured data into formats that data analyst can use to for analysis
Assemble large, complex data sets that meet functional / non-functional business requirements
Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
Build the infrastructure required for optimal extraction, transformation, and delivery of data from a wide variety of data sources using SQL and AWS ‘Big Data’ technologies
Work with stakeholders including the Executive, Product, and program teams to assist with data-related technical issues and support their data infrastructure needs.
Work with data and analytics experts to strive for greater functionality in our data systems
Develops and maintains scalable data pipelines and builds out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using HQL and 'Big Data' technologies
Implements processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it
Write unit/integration tests, contribute to engineering wiki, and document work
Performs root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement
Who You Are:
You’re passionate about Data and building efficient data pipelines
You have excellent listening skills and empathetic to others
You believe in simple and elegant solutions and give paramount importance to quality
You have a track record of building fast, reliable, and high-quality data pipelines
Passionate with good understanding of data, with a focus on having fun, while delivering incredible business results
Must have skills:
A Data Engineer with 5+ years of relevant experience who is excited to apply their current skills and to grow their knowledge base.
A Data Engineer who has attained a degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field.
Has experience using the following software/tools:
Experience with big data tools: Hadoop, Spark, Kafka, Hive etc.
Experience with relational SQL and NoSQL databases, including Postgres and Cassandra
Experience with data pipeline and workflow management tools
Experience with AWS cloud services: EC2, EMR, RDS, Redshift
Experience with object-oriented/object function scripting languages: Python, Java, Scala, etc.
Experience with Airflow/Ozzie
Experience in AWS/Spark/Python development
Experience in GIT, JIRA, Jenkins, Shell scripting
Familiar with Agile methodology, test-driven development, source control management and automated testing
Build processes supporting data transformation, data structures, metadata, dependencies and workload management
Experience supporting and working with cross-functional teams in a dynamic environment
Nice to have skills:
Experience with stream-processing systems: Storm, Spark-Streaming, etc. a plus
Experience with Snowflake
Any Graduate