Job Description
At least 7 years of experience in Information Technology.
At least 3 years of hands-on experience with Hadoop distributed frameworks while handling large amount of data using Spark or pySpark and Hadoop Ecosystems.
Proven experience in data engineering, data architecture, or a related field
At least 2 years of experience with Spark or PySpark is required.
Strong understanding of data modeling, data warehousing, and ETL concepts
Proficiency in SQL and experience with at least one major data analytics platform, such as Hadoop or Spark
At least 2 years of experience with Scala or Python is required.
Preferred Qualifications:
Experience in understanding of Design Patterns, ability to discuss tradeoffs between RDBMS vs Distributed Storage
Experience in design and implementing a tiered data architecture that integrates analytics data from multiple sources in an efficient and effective manner.
Experience with data orchestration tools like Airflow is a nice to have.
Excellent problem-solving and analytical skills, and the ability to work well under tight deadlines.
Excellent interpersonal skills and the ability to collaborate effectively with cross-functional teams.
Experience in developing data models and mapping rules to transform raw data into actionable insights and reports.
Experience in collaborating with the analytics and business teams to understand their requirements, with cross-functional teams to define and implement data governance policies and standards.
Experience in developing data validation and reconciliation processes to ensure data quality and accuracy is met.
Experience with development and maintenance of user documentation, including data models, mapping rules, and data dictionaries
Any Graduate