Job Description:-
Design and implement data pipelines and ETL/ELT workflows using Databricks notebooks (PySpark/Scala) and AWS services (S3, Glue, Lambda, Redshift, etc.).
Develop scalable and reusable components for data ingestion, transformation, and validation.
Optimize Spark jobs for performance, scalability, and cost-efficiency on Databricks.
Integrate with AWS ecosystem components such as S3, Glue Catalog, IAM, CloudWatch, and others.
Build and maintain Delta Lake tables, streaming jobs, and time-based window aggregations.
Work with data analysts, data scientists, and business stakeholders to support analytics and ML workloads.
Collaborate with DevOps and cloud engineering teams to automate deployments using CI/CD pipelines.
Ensure data governance, quality, lineage, and security compliance across all data pipelines.
3+ years of experience in big data engineering, with at least 1–2 years using Databricks on AWS.
Proficiency in PySpark, Spark SQL, and Delta Lake.
Strong hands-on experience with AWS services such as S3, Glue, Lambda, Athena, EMR, and Redshift.
Familiarity with Databricks Workspaces, Jobs, Clusters, and DBFS.
Experience building streaming and batch pipelines using Spark Structured Streaming or Kafka.
Solid understanding of data modeling, data lakehouse architecture, and performance tuning in distributed systems.
Knowledge of CI/CD pipelines, Git, and Infrastructure as Code (IaC) tools (e.g., Terraform, CloudFormation).
Strong SQL skills and experience working with large-scale datasets.
Any Graduate