Duties:
- Setup and maintain production scale Databricks environment on public cloud such as Microsoft Azure and AWS (Amazon Web Services)
- Setup and maintain production scale data storage such as ADLS (Azure Data Lake Storage) and AWS S3 for multiple tenant teams using our Data Platform
- Setup and maintain production scale micro services to support the daily operation of our data platform. Services include job scheduling, security, financial, and administrative services, etc.
- Provide triage and guidance to the team on various support issues raised by our tenants
- Develop tools and automation solutions for configuration management, service deployments, monitoring, and alerting to assist with daily RTB (Running the Business) operations
- Budget and monitor cloud spend, always think of ways to avoid cloud resource wastage, utilize 3rd party tools, or develop your own tools to help the team with cost optimization
- Assure security and privacy compliance and implement Adobe Security & Compliance solutions to lock down data stored in our data lake
- Explore GenAI technologies and find opportunity to integrate them with our data platform, providing platform enhancement or improving platform user experience in the end
- Work with various 3rd party vendors for troubleshooting, proof of concept, and other collaborative projects to enhance our product.
Skills:
- Cloud Infrastructure Administration and Automation: AWS, Azure
- Proficient with following storage technologies: ADLS Gen2, AWS S3, Hive or MySQL, MongoDB, Vector Databases
- Setup, troubleshoot and maintain following technologies: Databricks Workspace, includes but not limited to - Unity Catalog, Vector Search, SQL Warehouse, Serverless Compute, Spark workloads, Airflow and DAGs, Azure Kubernetes Service or Elastic Kubernetes Service, Collibra, Neo4J, Metric Insights
- Ability to setup monitoring and alerting with: Databricks System Tables, Prometheus, Splunk, ELK, PowerBI
- Familiar with how to troubleshoot and maintain: Servers with Linux system, Kubernetes environment
- Knowledge to operate with: Jira, Service Now
Education: BS in Computer Science, Computer Engineering, or similar