Job Description:
We are seeking a highly skilled Python & PySpark Developer to join our dynamic team. This position will be responsible for developing and maintaining complex data processing systems using Python and PySpark, ensuring high performance and scalability.
Key Responsibilities:
Utilize Python and PySpark to design, develop, and maintain robust data processing pipelines. Ensure these solutions are scalable, efficient, and meet business requirements.
Demonstrate strong expertise in handling large datasets using Hadoop, Spark, and other big data technologies. Implement data ingestion strategies to ensure timely and accurate data entry into your systems.
Apply GitOps principles to manage infrastructure as code, ensuring consistent and reliable deployment of applications across various environments.
Design, implement, and optimize data pipelines that can handle diverse data sources and formats. Ensure these pipelines are efficient, reliable, and capable of scaling to accommodate growing data volumes.
Work closely with cross-functional teams including data scientists, product managers, and operations engineers to understand business needs and translate them into technical solutions.
Stay updated with the latest trends and technologies in data processing and contribute to the continuous improvement of existing systems.
Requirements:
Proficient in Python programming and experienced with PySpark. A deep understanding of object-oriented programming concepts and functional programming paradigms is essential.
Preferred Qualifications:
Bachelor’s degree in computer science, Data Science, or a related field.
Certifications in Python, PySpark, or Big Data technologies (e.g., Cloudera Certified Data Engineer, AWS Certified Data Analytics Specialty).
Minimum 3 years of professional experience in developing data processing solutions using Python and PySpark
Any Graduate