Description

What you'll do

  • Develop and maintain Python-based web scrapers to efficiently extract structured and unstructured data from various websites and sources.
  • Design scripts to automate repetitive scraping tasks and schedule jobs using tools like cron or Airflow.
  • Store and manage scraped data in databases (SQL/NoSQL) or cloud storage solutions.
  • Utilize tools and techniques to bypass CAPTCHAs, IP blocking, and other challenges encountered during web scraping.
  • Ensure scrapers are optimized for performance and can handle large-scale scraping without crashing or slowing down.
  • Adhere to web scraping best practices and ensure compliance with legal standards.
  • Process and clean data: Transform raw scraped data into structured formats (e.g., CSV, JSON) and ensure data quality through validation and cleaning processes.
  • Collaborate with data analysts, product managers, and other developers to understand data requirements and deliver high-quality results.

What experience do you need

  • A Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field.
  • 3+ years of professional experience in software engineering with a strong focus on Python development and proven experience writing Python code to extract data from websites, ensuring efficiency, accuracy, and adherence to best practices.  
  • 2+ years of experience with web technologies, including a solid understanding of JavaScript, HTML, CSS, and XML for effective entity extraction and hands-on experience designing, querying, and managing data in both SQL or NoSQL databases.
  • 2+ years of experience with core Python web scraping libraries such as Scrapy and BeautifulSoup for HTML parsing and browser automation tools like Selenium or Playwright for handling dynamic, JavaScript-rendered content, handling data formats like JSON and CSV, coupled with experience in data cleaning and validation techniques.  
  • English proficiency of B2 or higher. 

What could set you apart

  • Understanding the importance of respecting website terms of service and avoiding harmful scraping practices.
  • Experience with cloud platforms like AWS, Google Cloud, or Azure.
  • Network traffic understanding or experience.
  • Experience working with SDLC and Testing. 
  • Proficiency with version control systems, particularly Git, for collaborative development and code management.
  • Familiarity with CI/CD pipelines.

We offer comprehensive compensation and healthcare packages, on-site doctor, paramedics service 24/7, life insurance, gym facilities, collaborative workspaces, free transportation and parking, subsidized cafeteria, solidarity association, and organizational growth potential through our online learning platform with guided career tracks

Education

Any Graduate