About this project
it-programming / desktop-apps
Open
I require a skilled freelancer to develop a custom tool or script for extracting tabular data from a large collection of PDF files. All PDF documents share an identical table layout, ensuring a consistent extraction process across all files. The primary objective is to accurately pull this structured data and export it into a clean, ready-to-use Microsoft Excel (.xlsx) file.
Key requirements for this project include:
* **Automated Processing**: The solution must efficiently handle a large volume of PDF files.
* **Consistent Layout**: All input PDFs have the same table structure, which should be leveraged for reliable extraction.
* **Data Integrity**: The extracted data must precisely match the content in the PDFs, including headers and values, without any omissions or alterations.
* **Output Format**: The final output must be a well-formatted Excel (.xlsx) file that requires no manual adjustments or cleaning.
* **Tool/Script Development**: The freelancer is free to choose the most effective programming language and libraries (e.g., Python with libraries like Camelot, Tabula, or PDFMiner; Java with Apache PDFBox, etc.) To achieve the desired accuracy and consistency.
The ideal candidate will have proven experience in data extraction, PDF processing, and delivering robust, automated solutions.
Category IT & Programming
Subcategory Desktop apps
What is the scope of the project? Create a new app
Delivery term: Not specified
Skills needed