Description

We are looking for a Senior Data Engineer with a deep understanding of Apache Spark (Scala & PySpark), Kafka Streams (Java), AWS services, Snowflake, Apache Iceberg, Tableau, and Data Lake architectures. As a senior member of our team, you will be responsible for leading the design, implementation, and optimization of large-scale data systems, real-time streaming solutions, and cloud-based data platforms. You will work with other engineers to deliver high-quality data solutions, mentor junior team members, and collaborate closely with cross-functional teams to solve complex business problems. 

 Key Responsibilities:  

  • Lead the design and development of scalable, high-performance data architectures on AWS, leveraging services such as S3, EMR, Glue, Redshift, Lambda, and Kinesis. Architect and manage Data Lakes for handling structured, semi-structured, and unstructured data. 
  • Design and build complex data pipelines using Apache Spark (Scala & PySpark), Kafka Streams (Java), and cloud-native technologies for batch and real-time data processing. Optimize these pipelines for high performance, scalability, and cost-effectiveness. 
  • Develop and optimize real-time data streaming applications using Kafka Streams in Java. Build reliable, low-latency streaming solutions to handle high-throughput data, ensuring smooth data flow from sources to sinks in real-time. 
  • Manage Snowflake for cloud data warehousing, ensuring seamless data integration, optimization of queries, and advanced analytics. Implement Apache Iceberg in Data Lakes for managing large-scale datasets with ACID compliance, schema evolution, and versioning. 
  • Design and maintain highly scalable Data Lakes on AWS using S3, Glue, and Apache Iceberg. Ensure data is easily accessible, stored in optimal formats, and well-integrated with downstream analytics systems. 
  • Work with business stakeholders to create actionable insights using Tableau. Build data models and dashboards that drive key business decisions, ensuring that data is easily accessible and interpretable. 
  • Continuously monitor and optimize Spark jobs, Kafka Streams processing, and other cloud-based data systems for performance, scalability, and cost. Implement best practices for stream processing, batch processing, and cloud resource management. 
  • Lead and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence. Ensure high-quality code delivery, adherence to best practices, and optimal use of resources. 
  • Work closely with Data Scientists, Product Managers, and DevOps teams to understand business needs and deliver impactful data solutions. Participate in technical discussions, from system design to data governance. 
  • Ensure that data pipelines, architectures, and systems are thoroughly documented and follow coding and design best practices. Promote knowledge-sharing across the team to maintain high standards for quality and scalability. 

Required Skills & Qualifications:  
 

Experience:  

  • 5+ years of experience in Data Engineering or a related field, with a proven track record of designing, implementing, and maintaining large-scale distributed data systems. 
  • Proficiency in Apache Spark (Scala & PySpark) for distributed data processing and real-time analytics. 
  • Hands-on experience with Kafka Streams using Java for real-time data streaming applications. 
  • Strong experience in Data Lake architectures on AWS, using services like S3, Glue, EMR, and data management platforms like Apache Iceberg. 
  • Proficiency in Snowflake for cloud-based data warehousing, data modeling, and query optimization. 
  • Expertise in SQL for querying relational and NoSQL databases, and experience with database design and optimization. 

Technical Skills:  

  • Strong Experience in building ETL pipelines using Spark(Scala & Pyspark) and maintain them. 
  • Proficiency in Java, particularly in the context of building and optimizing Kafka Streams applications for real-time data processing. 
  • Experience with AWS services (e.g., Lambda, Redshift, Athena, Glue, S3) and managing cloud infrastructure. 
  • Expertise with Apache Iceberg for handling large-scale, transactional data in Data Lakes, supporting versioning, schema evolution, and partitioning. 
  • Experience with Tableau for business intelligence, dashboard creation, and data visualization is a plus. 
  • Knowledge of CI/CD tools and practices, particularly in data engineering environments. 
  • Familiarity with containerization tools like Docker and Kubernetes for managing cloud-based services. 

Soft Skills:  

  • Excellent problem-solving skills, with a strong ability to debug and optimize large-scale distributed systems. 
  • Strong communication skills to engage with both technical and non-technical stakeholders. 
  • Proven leadership ability, including mentoring and guiding junior engineers. 
  • A collaborative mindset and the ability to work across teams to deliver integrated solutions. 

Preferred Qualifications:  

  • Experience with stream processing frameworks like Apache Flink or Apache Beam. 
  • Knowledge of machine learning workflows and integration of ML models in data pipelines. 
  • Familiarity with data governance, security, and compliance practices in cloud environments. 
  • Experience with DevOps practices and infrastructure automation tools such as Terraform or CloudFormation. 

Education:  

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent work experience). 


 

Education

Bachelor's or Master's degrees