Description

  • We are seeking a skilled and experienced PySpark Developer to join our dynamic team.
  • The ideal candidate will have a strong background in software development, with a focus on data transformation, processing, and performance optimization using PySpark. Expertise in SQL and its functions is required.
  • Developer will be responsible for designing, developing, and maintaining scalable data transformation solutions, as well as developing efficient Spark jobs and managing job scheduling.
  • Designs, codes, tests, debugs and documents software according to client's systems quality standards, policies and procedures.
  • Analyzes business needs and creates software solutions.
  • Responsible for preparing design documentation.
  • Prepares test data for unit, string and parallel testing.
  • Evaluates and recommends software and hardware solutions to meet user needs.
  • Resolves customer issues with software solutions and responds to suggestions for improvements and enhancements.
  • Works with business and development teams to clarify requirements to ensure testability.
  • Drafts, revises, and maintains test plans, test cases, and automated test scripts.
  • Executes test procedures according to software requirements specifications Logs defects and makes recommendations to address defects.
  • Retests software corrections to ensure problems are resolved.
  • Documents evolution of testing procedures for future replication.
  • May conduct performance and scalability testing.

Essential Job Functions:

  • Plans, conducts and leads assignments generally involving moderate, high budgets projects or more than one project.
  • Manages user expectations regarding appropriate milestones and deadlines.
  • Assists in training, work assignment and checking of less experienced developers.
  • Serves as technical consultant to leaders in the IT organization and functional user groups.
  • Subject matter expert in one or more technical programming specialties; employs expertise as a generalist of a specialist.
  • Performs estimation efforts on complex projects and tracks progress.
  • Works on the highest level of problems where analysis of situations or data requires an in-depth evaluation of various factors.
  • Documents, evaluates and researches test results; documents evolution of testing scripts for future replication.
  • Identifies, recommends and implements changes to enhance the effectiveness of quality assurance strategies.

Qualifications:

  • Strong expertise in Python programming:
    • Deep knowledge of Python language features, libraries, and best practices.
  • Extensive experience with Apache Spark and PySpark:
    • Writing optimized Spark jobs, transformations, actions, and working with RDDs and DataFrames.
  • Solid understanding of big data processing concepts:
    • Distributed computing, fault tolerance, partitioning, and data shuffling.
  • Experience with ETL pipeline development:
    • Designing, implementing, and maintaining scalable data pipelines.
  • Proficient in working with large datasets:
    • Handling data ingestion, cleansing, and transformation efficiently.
  • Good understanding of SQL:
    • Writing complex SQL queries and integrating Spark SQL where needed.
  • Hands-on experience with FastAPI:
    • Developing high-performance, scalable RESTful APIs and microservices using FastAPI.
  • Experience with cloud platforms and services:
    • Working knowledge of AWS (EMR, S3), Azure, or GCP cloud environments.
  • Familiarity with Hadoop ecosystem components:
    • HDFS, Hive, HBase, or similar tools.
  • Knowledge of data serialization formats:
    • JSON, Parquet, Avro, ORC.
  • Strong debugging and troubleshooting skills:
    • Ability to profile, optimize, and debug Spark and API applications.
  • Experience with version control systems:
    • Git or similar tools.
  • Understanding of software development lifecycle and Agile methodologies:
    • Participating in code reviews, CI/CD pipelines, and sprint planning.
  • Excellent problem-solving and communication skills:
    • Ability to collaborate with data engineers, analysts, and stakeholders.
  • Python Developer, PySpark, Airflow, Fast API, Rest API, Cloud, SQL, Spark SQL

Education

Bachelor's degree