Job Summary
We are seeking a highly skilled Python Automation Developer with 4 to 5 years of experience in creating automations. You will develop conversion scripts, XML transformation logic, and validation pipelines aligned with industry schemas.
This role focuses on building efficient, scalable solutions using the Python ecosystem, with strong emphasis on XML technologies and document engineering.
Key Responsibilities
- Build Python-based tools and scripts to convert:
- Implementing export, parsing, and post-processing workflows for unstructured and semi-structured content
- Utilizing open document standards and libraries to extract and restructure textual data using python-docx, pandoc, or zip-based OOXML parsing
- Applying OCR and layout-aware parsing techniques for scanned and text-based documents using pdfminer, pdfplumber, or PyMuPDF + Tesseract OCR
- Converting markup-based content using schema-driven transformations and rule-based logic using wrappers for OpenSP, custom parsing, and rule-based DTD-to-XSD mapping.
- Design and implement XSLT transformations and XPath queries using Python libraries
- Perform XML validation using XSD, DTD, and Schematron via lxml, xmlschema, or saxonpy
- Create rule-based data cleanup and normalization scripts using re, lxml, or BeautifulSoup
- Collaborate with technical writers and schema architects to ensure accurate semantic tagging
- Develop reusable automation pipelines and workflows for large-scale document conversion
Python Libraries & Tools:
- lxml for XML parsing, editing, and validation
- xmlschema for working with XML Schema (XSD)
- saxonpy or saxonche for advanced XSLT 2.0/3.0 transformations
- python-docx, zipfile, or docx2python for DOCX parsing
- pdfminer.six, PyMuPDF (fitz), pdfplumber for PDF parsing
- pytesseract, OCRmyPDF, or LayoutParser for OCR and layout detection
- re, BeautifulSoup, pandas for content normalization and restructuring
Schema & Transformation:
- Strong understanding of XML, XSLT, XPath, DTD/XSD
- Experience with Schematron for business rule validation
- Familiarity with aerospace schemas: S1000D, iSpec 2200
Automation & Pipeline Design:
- Build Python-based ETL-style conversion workflows
- Write maintainable, modular code for processing structured/unstructured data
- Familiar with Git and optionally CI/CD, Docker
Work Experience
Nice to Have:
- Experience with FrameMaker structured export and scripting
- Exposure to DITA, DocBook, or modular content architectures
- Understanding of CMS platforms or IETP systems used in aerospace
Soft Skills:
- Detail-oriented with strong analytical and structural thinking
- Effective communicator, including with non-technical teams
- Comfortable working independently in loosely defined environments
Education:
- Bachelor’s degree in Computer Science, Engineering, or Equivalent.
- Certifications in XML, Technical Writing, or Python are a plus