Key Responsibilities:
· Design, develop, and maintain ETL pipelines using AWS services, Python, and Spark.
· Optimize data ingestion, transformation, and storage processes for high-performance data processing.
· Work with structured and unstructured data, ensuring data integrity, quality, and governance.
· Develop SQL queries to extract and manipulate data efficiently from relational databases.
· Implement data validation and testing frameworks using Pytest to ensure data accuracy and reliability.
· Collaborate with data scientists, analysts, and software engineers to build scalable data solutions.
· Monitor and troubleshoot data pipelines to ensure smooth operation and minimal downtime.
· Stay up-to-date with industry trends, tools, and best practices for data engineering and cloud technologies.
Required Skills & Qualifications:
· Experience in Data Engineering or a related field.
· Strong proficiency in AWS (S3, Glue, Lambda, EMR, Redshift, etc.) for cloud-based data processing.
· Hands-on experience with Python for data processing and automation.
· Expertise in Apache Spark for distributed data processing.
· Solid understanding of ETL pipeline design and data warehousing concepts.
· Proficiency in SQL for querying and managing relational databases.
· Experience writing unit and integration tests using Pytest.
· Familiarity with CI/CD pipelines and version control systems (e.g., Git).
· Strong problem-solving skills and ability to work in a fast-paced environment.
Preferred Qualifications:
· Experience with Terraform, Docker, or Kubernetes.
· Knowledge of big data tools such as Apache Kafka or Airflow.
· Exposure to data governance and security best practices
Any Gradute