What you’ll do:
- We seek Software Engineers with experience building and scaling services in on-premises and cloud environments.
- As a Principal Software Engineer in the Epsilon Attribution/Forecasting Product Development team, you will design, implement, and optimize data processing solutions using Scala, Spark, and Hadoop.
- Collaborate with cross-functional teams to deploy big data solutions on our on-premises and cloud infrastructure along with building, scheduling and maintaining workflows.
- Perform data integration and transformation, troubleshoot issues, Document processes, communicate technical concepts clearly, and continuously enhance our attribution engine/forecasting engine.
- Strong written and verbal communication skills (in English) are required to facilitate work across multiple countries and time zones. Good understanding of Agile Methodologies – SCRUM.
Qualifications
- Strong experience (5 -8 years) in Scala programming language and extensive experience with Apache Spark for Big Data processing for design, developing and maintaining scalable on-prem and cloud environments, especially on AWS and as needed with GCP cloud.
- Proficiency in performance tuning of Spark jobs, optimizing resource usage, shuffling, partitioning, and caching for maximum efficiency in Big Data environments.
- In-depth understanding of the Hadoop ecosystem, including HDFS, YARN, and MapReduce.
- Expertise in designing and implementing scalable, fault-tolerant data pipelines with end-to-end monitoring and alerting.
- Using Python to develop infrastructure modules. Hence, hands-on experience with Python.
- Solid grasp of database systems and SQLs for writing efficient SQL’s (RDBMS/Warehouse) to handle TBS of data.
- Familiarity with design patterns and best practices for efficient data modelling, partitioning strategies, and sharding for distributed systems and experience in building, scheduling and maintaining DAG workflows.
- End-to-end ownership with definition, development, and documentation of software’s objectives, business requirements, deliverables, and specifications in collaboration with stakeholders.
- Experience in working on GIT (or equivalent source control) and solid understanding of Unit and integration test frameworks.
- Must have the ability to collaborate with stakeholders/teams to understand requirements and develop a working solution and the ability to work within tight deadlines and effectively prioritize and execute tasks in a high-pressure environment.
- Must be able to mentor junior staff.
Advantageous to have experience on below:
- Hands-on with Databricks for unified data analytics, including Databricks Notebooks, Delta Lake, and Catalogues.
- Proficiency in using the ELK (Elasticsearch, Logstash, Kibana) stack for real-time search, log analysis, and visualization.
- Strong background in analytics, including the ability to derive actionable insights from large datasets and support data-driven decision-making.
- Experience with data visualization tools like Tableau, Power BI, or Grafana.
- Familiarity with Docker for containerization and Kubernetes for orchestration.