Data Engineer

Bravens Inc
Mountain View, CA, USA

Description

Must have:

· Strong hands-on experience with Apache Spark for data processing and analytics.
· Proficiency in writing advanced SQL queries, including complex joins, aggregations, and window functions.
· Familiarity with Spark components such as Spark SQL, Spark Streaming, and PySpark.
· Understanding of distributed computing concepts and Spark architecture (e.g., RDDs, DAGs, partitions).
· Experience working with large datasets, data lakes, and data warehouses.
· Knowledge of file formats like Parquet, Avro, and ORC.
· Proven ability to optimize Spark jobs and SQL queries for efficiency and scalability.
· Strong problem-solving skills with attention to detail.
· Ability to collaborate effectively with cross-functional teams.
· Excellent communication skills for sharing insights and progress with stakeholders.
· Knowledge of Python, Scala, or Java for Spark application development

Good to have:

· Experience with big data ecosystems like Hadoop, Hive, or HBase.
· Familiarity with workflow orchestration tools such as Apache Airflow or Luigi.
· Knowledge of NoSQL databases like MongoDB, Cassandra, or Elasticsearch.
· Experience deploying Spark jobs on cloud platforms (e.g., AWS EMR, Azure Synapse, or Google Dataproc)
· Familiarity with cloud data platforms like Snowflake, BigQuery, or Redshift.
· Scripting experience for automating repetitive tasks.
· Familiarity with monitoring tools like Prometheus, Grafana, or Spark’s built-in UI.
· Hands-on experience with debugging tools for Spark and SQL processes.
· Relevant certifications in big data (e.g., Databricks Certified Associate, Cloudera Certified Developer).
· Understanding of industry-specific data needs, such as finance, healthcare, or retail analytics.

What You'll Do

· Build and maintain distributed data processing pipelines using Apache Spark.
· Write efficient SQL queries to extract, transform, and analyze large datasets.
· Perform data cleansing, validation, and enrichment to ensure high-quality datasets.
· Optimize Spark jobs for performance, including tuning Spark configurations and improving query efficiency.
· Implement partitioning, caching, and indexing strategies for large-scale data processing.
· Develop and manage ETL workflows to process data from various sources into data lakes or warehouses.
· Collaborate with data engineers to integrate data from structured and unstructured sources.
· Monitor Spark jobs and cluster performance, addressing bottlenecks and failures.
· Troubleshoot SQL queries and Spark processes to resolve performance and accuracy issues.
· Work closely with data engineers, analysts, and stakeholders to understand data requirements.
· Present findings and insights derived from large datasets to business teams.
· Document workflows, best practices, and troubleshooting guides for Spark and SQL usage.

Key Skills

Apache Spark Spark Sql Pyspark Parquet Avro Orc Python Scala Java Nosql

Education

Any Graduate

Apply Now

Back To Jobs

Posted On: 30+ Days Ago
Experience: 5+ years of experience
Openings: 1
Category: Data Engineer
Tenure: Flexible Position