Lead PySpark Developer

Responsibilities:

Lead the development and architecture of scalable data processing systems using PySpark.
Design and implement efficient and reliable data pipelines, data lakes, and ETL workflows.
Fine-tune Spark applications for optimal performance, including configuration tuning, memory management, and resource allocation.
Collaborate with data engineers, data scientists, and stakeholders to understand data processing requirements and deliver robust solutions.
Manage and optimize Spark clusters, ensuring high availability and performance, utilizing tools like Kubernetes, YARN, and Mesos.
Work with big data storage solutions such as HDFS, S3, Parquet, and ORC to manage data storage and retrieval efficiently.
Utilize Spark SQL, DataFrames, and Dataset APIs to perform complex data transformations and analytics.
Apply best practices in distributed computing principles and stay current with the latest technologies and trends in big data processing.

Requirements:

10+ Years experience as a Lead Spark Developer, Data Engineer, or similar role with extensive hands-on PySpark experience.
Strong proficiency in Python and Spark APIs.
Deep understanding of distributed computing principles, architectures, and best practices.
Expertise in designing and developing fault-tolerant and scalable data processing systems.
Strong skills in tuning Spark applications, including configuration, memory, and resource management.
Experience with cluster management tools such as Kubernetes, YARN, or Mesos.
Practical knowledge of big data storage solutions including HDFS, S3, and formats like Parquet and ORC.
Demonstrated ability to design and implement efficient data pipelines and data lakes.
Excellent problem-solving, communication, and collaboration skills.

Any Graduate

Back To Jobs