What you'll do
- Develop and maintain Python-based web scrapers to efficiently extract structured and unstructured data from various websites and sources.
- Design scripts to automate repetitive scraping tasks and schedule jobs using tools like cron or Airflow.
- Store and manage scraped data in databases (SQL/NoSQL) or cloud storage solutions.
- Utilize tools and techniques to bypass CAPTCHAs, IP blocking, and other challenges encountered during web scraping.
- Ensure scrapers are optimized for performance and can handle large-scale scraping without crashing or slowing down.
- Adhere to web scraping best practices and ensure compliance with legal standards.
- Process and clean data: Transform raw scraped data into structured formats (e.g., CSV, JSON) and ensure data quality through validation and cleaning processes.
- Collaborate with data analysts, product managers, and other developers to understand data requirements and deliver high-quality results.
What experience do you need
- A Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related technical field.
- 3+ years of professional experience in software engineering with a strong focus on Python development and proven experience writing Python code to extract data from websites, ensuring efficiency, accuracy, and adherence to best practices.
- 2+ years of experience with web technologies, including a solid understanding of JavaScript, HTML, CSS, and XML for effective entity extraction and hands-on experience designing, querying, and managing data in both SQL or NoSQL databases.
- 2+ years of experience with core Python web scraping libraries such as Scrapy and BeautifulSoup for HTML parsing and browser automation tools like Selenium or Playwright for handling dynamic, JavaScript-rendered content, handling data formats like JSON and CSV, coupled with experience in data cleaning and validation techniques.
- English proficiency of B2 or higher.
What could set you apart
- Understanding the importance of respecting website terms of service and avoiding harmful scraping practices.
- Experience with cloud platforms like AWS, Google Cloud, or Azure.
- Network traffic understanding or experience.
- Experience working with SDLC and Testing.
- Proficiency with version control systems, particularly Git, for collaborative development and code management.
- Familiarity with CI/CD pipelines.
We offer comprehensive compensation and healthcare packages, on-site doctor, paramedics service 24/7, life insurance, gym facilities, collaborative workspaces, free transportation and parking, subsidized cafeteria, solidarity association, and organizational growth potential through our online learning platform with guided career tracks