About this project
it-programming / artificial-intelligence-1
Open
We are seeking an experienced ai/ml developer to build a robust python-based tool for extracting critical data from pharmaceutical regulatory pdfs. This includes documents such as analytical reports, certificates of analysis, and stability reports. The extracted data will then be used to populate predefined Word templates. A key requirement for this project is meticulous traceability. The tool must generate source references, including filename, page number, and section header, ensuring every piece of extracted data can be accurately traced back to its original location within the PDF. This is crucial for regulatory defensibility. The application should be capable of: - Accepting multiple PDF inputs. - Handling both native and scanned PDF documents. - Extracting various data types, including tables and key-value pairs. - Mapping extracted data to specific fields within Word templates. - Outputting fully populated Word documents suitable for regulatory submission. - Generating comprehensive audit logs to support regulatory compliance. We are looking for a freelancer with extensive experience in: - Python development for ai/ml applications. - PDF parsing using libraries such as pdfplumber, PyMuPDF, Camelot, or Tabula. - A proven portfolio demonstrating successful table extraction from complex PDFs. - Intelligent data extraction utilizing Large Language Models (LLMs) like GPT-4 or Claude. - Optical Character Recognition (OCR) technologies, including Tesseract, AWS Textract, or Azure. - Prior experience in projects involving structured data extraction from PDFs. Experience with pharmaceutical or regulatory documents and familiarity with document AI platforms (e.g., AWS Textract, Azure Form Recognizer, Google Document AI) are highly preferred but not strictly required. The project will commence with a small paid pilot phase, involving 3 to 5 PDFs, to evaluate the freelancer's capabilities and fit. Following a successful pilot, we intend to collaborate with the chosen freelancer to build the full platform over time, with ongoing ad hoc services as needed. We emphasize that this is not a black-box extraction; every field must be traceable back to its source. Serious inquiries only.
Category IT & Programming
Subcategory Artificial Intelligence
Project size Large
Delivery term: Not specified
Skills needed