Description

Job Description: 

Qualifications 

What you bring: 

Build Data pipelines required for optimal extraction, anonymization, and transformation of data from a wide variety of data sources using SQL, NoSQL and AWS ‘big data’ technologies. 

Streaming Batch 

Work with stakeholders including the Product Owners, Developers and Data scientists to assist with data-related technical issues and support their data infrastructure needs. 

Ensure that data is secure and separated following corporate compliance and data governance policies 

Take ownership of existing ETL scripts, maintain and rewrite them in modern data transformation tools whenever needed. 

Being an automation advocate for data transformation, cleaning and reporting tools. 

You are proficient in developing software from idea to production 

You can write automated test suites for your preferred language 

You have frontend development experience with frameworks such as React.js/Angular 

You have backend development experience building and integrating with REST APIs and Databases using languages such as Java Spring, JavaScript on Node.js, Flask on Python 

You have experience with cloud-native technologies, such as Cloud Composer, Dataflow, Dataproc, BigQuery, GKE, Cloud run, Docker, Kubernetes, and Terraform 

You have used cloud platforms such as Google Cloud/AWS for application hosting 

You have used and understand CI/CD best practices with tools such as GitHub Actions, GCP Cloud Build 

You have experience with YAML and JSON for configuration 

You are up-to-date on the latest trends in AI Technology 

Great-to-haves: 

3+ years of experience as a data or software architect 

3+ years of experience in SQL and Python 

2+ years of experience with ELT/ETL platforms (Airflow, DBT, Apache Beam, PySpark, Airbyte) 

2+ years of experience with BI reporting tools (Looker, Metabase, Quicksight, PowerBI, Tableau) 

Extensive knowledge of the Google Cloud Platform, specifically the Google Kubernetes Engine 

Experience with GCP cloud data related services ( Dataflow, GCS, Datastream, Data Fusion, Data Application, BigQuery, Data Flow, Data Proc, Dataplex, PubSub, CloudSQL, BigTable) 

Experience in health industry an asset 

Expertise in Python, Java 

Interest in PaLM, LLM usage and LLMOps 

Familiarity with LangFuse or Backstage plugins or GitHub Actions 

Strong experience with GitHub beyond source control 

Familiarity with monitoring, alerts, and logging solutions 

Join us on this exciting journey to make Generative AI accessible to all and create a positive impact with technology 


 

Education

Any Graduate