Responsibilities:
Migration Leadership: Spearhead the migration of Big Data workloads from Cascading and MapReduce to Spark 3, ensuring minimal disruption to business operations. This includes planning, executing, and overseeing the entire migration lifecycle.
Hands-on Development: Engage in daily code development activities within the IntelliJ IDE, writing efficient and scalable Spark applications using Scala.
Code Review and Quality Assurance: Conduct thorough Scala code reviews to enforce coding standards, identify potential performance bottlenecks, and ensure code quality, maintainability, and adherence to best practices.
Code Implementation: Implement necessary code changes to adapt existing applications to the Spark 3 framework, optimizing performance and resource utilization.
Kubernetes Environment Design: Design and architect the target Kubernetes environment for Spark 3 deployments, considering factors such as scalability, high availability, fault tolerance, and resource management.
Collaboration: Work closely with data engineers, infrastructure teams, and other stakeholders to ensure seamless integration of Spark 3 into the overall Big Data ecosystem.
Performance Optimization: Identify and implement strategies to optimize Spark application performance, including tuning Spark configurations, optimizing data partitioning, and leveraging Spark's caching mechanisms.
Documentation: Create comprehensive technical documentation, including design specifications, migration plans, and operational procedures.
Knowledge Transfer: Provide knowledge transfer and mentorship to other team members, fostering a culture of learning and collaboration.
Qualifications:
Essential Skills and Experience:
Strong and demonstrable experience in migrating Big Data workloads from Cascading and MapReduce to Spark 3.
Expert-level proficiency in Scala programming language, with a deep understanding of Spark's architecture and internals.
Extensive hands-on experience with the IntelliJ IDE for development and debugging.
Solid understanding of Big Data concepts, data processing paradigms, and distributed computing principles.
Proven experience in designing, implementing, and managing Kubernetes environments for Spark deployments.
Excellent code review skills, with a keen eye for detail and a commitment to code quality.
Strong understanding of software development best practices, including version control, testing, and CI/CD.
Desired Skills:
Experience with other Big Data technologies, such as Hadoop, Hive, and Kafka.
Familiarity with cloud platforms (e.g., AWS, Azure, GCP) and their Big Data offerings.
Experience with performance tuning and optimization of Spark applications.
Knowledge of data warehousing concepts and ETL processes.
Strong communication, collaboration, and problem-solving skills.
Any Graduate