Senior Data Engineer

IT Faces Inc
Hyderabad, Telangana, India

Description

We are looking for a Senior Data Engineer with a deep understanding of Apache Spark (Scala & PySpark), Kafka Streams (Java), AWS services, Snowflake, Apache Iceberg, Tableau, and Data Lake architectures. As a senior member of our team, you will be responsible for leading the design, implementation, and optimization of large-scale data systems, real-time streaming solutions, and cloud-based data platforms. You will work with other engineers to deliver high-quality data solutions, mentor junior team members, and collaborate closely with cross-functional teams to solve complex business problems.

Key Responsibilities: 

Lead the design and development of scalable, high-performance data architectures on AWS, leveraging services such as S3, EMR, Glue, Redshift, Lambda, and Kinesis. Architect and manage Data Lakes for handling structured, semi-structured, and unstructured data.
Design and build complex data pipelines using Apache Spark (Scala & PySpark), Kafka Streams (Java), and cloud-native technologies for batch and real-time data processing. Optimize these pipelines for high performance, scalability, and cost-effectiveness.
Develop and optimize real-time data streaming applications using Kafka Streams in Java. Build reliable, low-latency streaming solutions to handle high-throughput data, ensuring smooth data flow from sources to sinks in real-time.
Manage Snowflake for cloud data warehousing, ensuring seamless data integration, optimization of queries, and advanced analytics. Implement Apache Iceberg in Data Lakes for managing large-scale datasets with ACID compliance, schema evolution, and versioning.
Design and maintain highly scalable Data Lakes on AWS using S3, Glue, and Apache Iceberg. Ensure data is easily accessible, stored in optimal formats, and well-integrated with downstream analytics systems.
Work with business stakeholders to create actionable insights using Tableau. Build data models and dashboards that drive key business decisions, ensuring that data is easily accessible and interpretable.
Continuously monitor and optimize Spark jobs, Kafka Streams processing, and other cloud-based data systems for performance, scalability, and cost. Implement best practices for stream processing, batch processing, and cloud resource management.
Lead and mentor junior engineers, fostering a culture of collaboration, continuous learning, and technical excellence. Ensure high-quality code delivery, adherence to best practices, and optimal use of resources.
Work closely with Data Scientists, Product Managers, and DevOps teams to understand business needs and deliver impactful data solutions. Participate in technical discussions, from system design to data governance.
Ensure that data pipelines, architectures, and systems are thoroughly documented and follow coding and design best practices. Promote knowledge-sharing across the team to maintain high standards for quality and scalability.

Required Skills & Qualifications: 

Experience: 

5+ years of experience in Data Engineering or a related field, with a proven track record of designing, implementing, and maintaining large-scale distributed data systems.
Proficiency in Apache Spark (Scala & PySpark) for distributed data processing and real-time analytics.
Hands-on experience with Kafka Streams using Java for real-time data streaming applications.
Strong experience in Data Lake architectures on AWS, using services like S3, Glue, EMR, and data management platforms like Apache Iceberg.
Proficiency in Snowflake for cloud-based data warehousing, data modeling, and query optimization.
Expertise in SQL for querying relational and NoSQL databases, and experience with database design and optimization.

Technical Skills: 

Strong Experience in building ETL pipelines using Spark(Scala & Pyspark) and maintain them.
Proficiency in Java, particularly in the context of building and optimizing Kafka Streams applications for real-time data processing.
Experience with AWS services (e.g., Lambda, Redshift, Athena, Glue, S3) and managing cloud infrastructure.
Expertise with Apache Iceberg for handling large-scale, transactional data in Data Lakes, supporting versioning, schema evolution, and partitioning.
Experience with Tableau for business intelligence, dashboard creation, and data visualization is a plus.
Knowledge of CI/CD tools and practices, particularly in data engineering environments.
Familiarity with containerization tools like Docker and Kubernetes for managing cloud-based services.

Soft Skills: 

Excellent problem-solving skills, with a strong ability to debug and optimize large-scale distributed systems.
Strong communication skills to engage with both technical and non-technical stakeholders.
Proven leadership ability, including mentoring and guiding junior engineers.
A collaborative mindset and the ability to work across teams to deliver integrated solutions.

Preferred Qualifications: 

Experience with stream processing frameworks like Apache Flink or Apache Beam.
Knowledge of machine learning workflows and integration of ML models in data pipelines.
Familiarity with data governance, security, and compliance practices in cloud environments.
Experience with DevOps practices and infrastructure automation tools such as Terraform or CloudFormation.

Education: 

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field (or equivalent work experience).

Key Skills

Pyspark Aws Apache Spark Lambda Redshift Athena Glue S3 Tableau Ci/cd

Education

Bachelor's or Master's degrees

Apply Now

Back To Jobs

Posted On: 15+ Days Ago
Experience: 5+ years of experience
Openings: 1
Category: Senior Data Engineer
Tenure: Full-Time Position