Description

Responsibilities:

  • Experience with big data processing and distributed computing systems like Spark.
  • Implement ETL pipelines and data transformation processes.
  • Ensure data quality and integrity in all data processing workflows.
  • Troubleshoot and resolve issues related to PySpark applications and workflows.
  • Understand source, dependencies and data flow from converted PySpark code.
  • Strong programming skills in Python and SQL.
  • Experience with big data technologies like Hadoop, Hive, and Kafka.
  • Understanding of data warehousing concepts and relational databases like SQL.
  • Demonstrate and document code lineage.
  • Integrate PySpark code with frameworks such as Ingestion Framework, DataLens, etc.,
  • Ensure compliance with data security, privacy regulations, and organizational standards.
  • Knowledge of CI/CD pipelines and DevOps practices.
  • Strong problem-solving and analytical skills.
  • Excellent communication and leadership

Education

Any Gradute