Automated pdf table data extraction to excel tool

About this project

Open

I require a skilled freelancer to develop a custom tool or script for extracting tabular data from a large collection of PDF files. All PDF documents share an identical table layout, ensuring a consistent extraction process across all files. The primary objective is to accurately pull this structured data and export it into a clean, ready-to-use Microsoft Excel (.xlsx) file.

Key requirements for this project include:

* **Automated Processing**: The solution must efficiently handle a large volume of PDF files.
* **Consistent Layout**: All input PDFs have the same table structure, which should be leveraged for reliable extraction.
* **Data Integrity**: The extracted data must precisely match the content in the PDFs, including headers and values, without any omissions or alterations.
* **Output Format**: The final output must be a well-formatted Excel (.xlsx) file that requires no manual adjustments or cleaning.
* **Tool/Script Development**: The freelancer is free to choose the most effective programming language and libraries (e.g., Python with libraries like Camelot, Tabula, or PDFMiner; Java with Apache PDFBox, etc.) To achieve the desired accuracy and consistency.

The ideal candidate will have proven experience in data extraction, PDF processing, and delivering robust, automated solutions.

Category IT & Programming
Subcategory Desktop apps
What is the scope of the project? Create a new app

Delivery term: Not specified

Skills needed

Python Java Scripts & Utilities Data Mining Adobe Acrobat Web Scraping Extract Transform Load... Software Testing Qa Automation

About this project

it-programming / desktop-apps

Open

Other projects posted by Eduard.