Key Responsibilities:
- Big Data Architecture: Design, develop, and maintain scalable and distributed data architectures capable of processing large volumes of data.
- Data Storage Solutions: Implement and optimize data storage solutions using technologies such as Hadoop, Spark, and PySpark.
- PySpark Development: Develop and implement efficient ETL processes using PySpark to extract, transform, and load large datasets.
- Performance Optimization: Optimize PySpark applications for better performance, scalability, and resource management.
Qualifications:
- Proven experience as a Big Data Engineer with a strong focus on PySpark.
- Deep understanding of Big Data processing frameworks and technologies.
- Strong proficiency in PySpark for developing and optimizing ETL processes and data transformations.
- Experience with distributed computing and parallel processing.
- Ability to collaborate in a fast-paced, innovative environment.
Skills:
Pyspark, Big Data, Python