Big Data Engineer

Tekfortune Inc
Raritan, NJ, USA

Description

Job Overview:
We’re seeking a highly skilled Data Engineer, Big Data Engineer to build scalable data pipelines, develop ML models, and integrate big data systems. You'll work with structured, semi-structured, and unstructured data, focusing on optimizing data systems, building ETL pipelines, and deploying AI models in cloud environments.

Key Responsibilities:
Data Ingestion: Build scalable ETL pipelines using Apache Spark, Talend, AWS Glue, Google Dataflow, Apache NiFi. Ingest data from APIs, file systems, and databases.

Data TransformationValidation: Use Pandas, Apache Beam, and Dask for data cleaning, transformation, and validation. Automate data quality checks with Pytest, Unittest.

Big Data Systems: Process large datasets with Hadoop, Kafka, Apache Flink, Apache Hive. Stream real-time data using Kafka, Google Cloud PubSub.

Task Queues: Manage asynchronous processing with Celery, RQ, RabbitMQ, or Kafka. Implement retry mechanisms and track task status.

Scalability: Optimize for performance with distributed processing (Spark, Flink), parallelization (joblib), and data partitioning.

CloudStorage: Work with AWS, Azure, GCP, Databricks. Store and manage data with S3, BigQuery, Redshift, Synapse Analytics, and HDFS.

Required Skills:
ETL Data Processing: Expertise in Apache Spark, AWS Glue, Google Dataflow, Talend.
Big Data Tools: Proficient with Hadoop, Kafka, Apache Flink, Hive, Presto.
Databases: Strong experience with MySQL, PostgreSQL, MongoDB, Cassandra.
Machine Learning: Hands-on with TensorFlow, PyTorch, Scikit-learn, XGBoost.

Key Skills

Apache Spark Aws Glue Google Dataflow Talend Hadoop Kafka Apache Flink Hive Presto Mysql

Education

Any Graduate

Apply Now

Back To Jobs

Posted On: 15+ Days Ago
Experience: 5+ years of experience
Availability: Remote
Openings: 1
Category: Big Data Engineer
Tenure: Flexible Position