Job Description:
In-depth knowledge of Apache Spark, Spark APIs [Spark SQL and Data Frame APIs, Spark Structured Stream Knowledge of Flink, streaming and batching modes, caching and optimizing performance.
· Design and develop analytics workloads using Apache Spark and Scala for processing of big data
· Create and optimize data transformation pipelines using Spark or Apache Flink
· Proficiency in performance tuning and optimization of Spark jobs
· Experience on migrating existing analytics workloads from cloud platforms to open-source Apache Spark
· Expertise in data modeling and optimization techniques for large-scale datasets
· Extensive experience with real spark production instance.
· Strong understanding of Data Lake, Big Data, ETL processes, and data warehousing concepts Coed understanding of lakeber
· infrastructure running on Kubern Expertise in data modeling and optimization techniques for large-scale datasets
· Good understanding of lakehouse storage technologies like Delta Lake and Apache Iceberg AWS knowledge
Other skills
· Technical Leadership: Lead and mentor a team of data engineers, analysts, and architects. Provide guidance on best practices, architecture
· Collaboration: Work closely with cross-functional teams including data scientists, business analysts, and developers to ensure seamless i
· Communication: Excellent verbal and written communication skills with the ability to convey complex technical concepts to non-technical
Any Graduate