Description

Job Description:

Key Responsibilities:

 

1. Data Profiling

  • Develop repeatable Python/SQL scripts to compute column statistics, null/unique distributions, outlier checks, referential integrity, and rule-based quality validations.
  • Generate and publish standardized profiling reports/dashboards for stakeholder review.

2. Data Mapping (S2T)

  • Create and maintain source-to-target mappings for ingestion and transformation layers, capturing business rules, lineage, assumptions, and edge cases.
  • Maintain version control of mapping documents in GitLab.

3. ELT Development

  • Extract/Load (Mage): Build and operate ingestion pipelines with retries, alerting, schema enforcement, and parameterized environment configurations.
  • Transform (dbt): Develop staging, cleansing, and mart-level models with dbt tests (unique, not_null, accepted_values) and generate documentation.

4. Versioning & CI/CD

  • Utilize GitLab for branching, merge request reviews, linting, dbt tests, and automated CI/CD deployments.

5. Data Quality Management

  • Implement and monitor data quality tests at every stage of the pipeline.
  • Track SLAs and enforce merge blocking on failures to prevent regression.

6. Documentation & Hand-offs

  • Maintain runbooks, S2T documents, dbt docs, and pipeline diagrams.
  • Ensure documentation updates within 3 business days of any change.

7. Collaboration

  • Partner with analysts, architects, and QA teams to clarify transformation rules, review designs, and meet acceptance criteria.

Requirements:

Required Qualifications:

  • 3–6+ years of experience in Data Engineering, preferably with offshore/remote delivery exposure.
  • Strong expertise in SQL (advanced queries, window functions, performance tuning) and Python (data processing with Pandas or PySpark).
  • Hands-on experience with Mage orchestration, dbt modeling, and GitLab workflows.
  • Solid understanding of data modeling, lineage tracking, and data quality frameworks.
  • Excellent communication skills and disciplined documentation practices.

Preferred Skills:

  • Experience with Snowflake, BigQuery, Redshift, or Azure Synapse.
  • Exposure to PySpark, Databricks, or Airflow.
  • Awareness of BI tools such as Power BI, Tableau, or Looker for downstream analytics integration

Education

Any Graduate