Description

Job Responsibilities

  • Design, develop, and optimize large-scale data pipelines using PySpark and Python.
  • Implement and adhere to standard methodologies in object-oriented programming to develop reusable, maintainable code.
  • Write advanced SQL queries for data extraction, transformation, and loading (ETL).
  • Work closely with data scientists, analysts, and collaborators to gather requirements and translate them into technical solutions.
  • Troubleshoot data-related issues and resolve them in a timely and accurate manner.
  • Leverage AWS cloud services (e.g., S3, EMR, Lambda, Glue) to build and manage cloud-native data workflows (preferred).
  • Participate in code reviews, data quality checks, and performance tuning of data jobs.

 

Required Skills & Qualifications

  • Strong hands-on experience with PySpark and Python, especially in crafting and implementing scalable data transformations.
  • Solid understanding of Object-Oriented Programming (OOP) principles and design patterns.
  • Proficient in SQL, with the ability to write complex queries and optimise performance.
  • Strong problem-solving skills and the ability to troubleshoot complex data issues independently.
  • Excellent communication and collaboration skills.

 

Preferred Qualifications (Nice to Have):

  • Exposure to data warehousing concepts, distributed computing, and performance tuning.
  • Familiarity with version control systems (e.g., Git), CI/CD pipelines, and Agile methodologies.

Education

Any Graduate