Completed

Software to "convert" (interpret/segmentate) complex pdf files into structured text files (Json)

Published on the March 30, 2016 in IT & Programming

About this project

Open

*Update: attached file "technicaldescription_and_pdfsamples" with pdfsamples and more detailed specification

the goal is to create a software/script to interpret/convert a complex pdf file into a text file (json). The script/software must be written in English and I must have complete access to the source files. ANY programming languages can be used (python, java, C/C++, ..
.).

There are several diferent types of pdf files and the software should be able to break the important information on the pdf file into a structured json file. So, the problem is to segmentate the information in the PDF files.

The PDF files are TESTs (math, science, informatics, etc) in portuguese. The ideia is to separate the important information so we can build a database with this information. The scope of this project is only to extract the informations from the pdf file and present in a json output.


The complexity is that the files are (very) different, many have figures and images that must be stored as well. I have a initial approach idea to solve the problem, but I'm open to discuss the problems and possible solutions as well. Detailed specification can be sent in case of interrest.
Im able to skype and explain everything.

The only skill needed is good programming skills and problem solving.

I'm very keen to help and discuss alternatives.

The job is not easy and price/time can be negotiate. I believe that good performance must be good rewarded ($$).

Thanks,

Category IT & Programming
Subcategory Desktop apps
Is this a project or a position? I don’t know yet
I currently have I have specifications
Required availability As needed
Experience in this type of projects Yes (I have managed this kind of project before)
Required platforms Windows

Delivery term: Not specified

Skills needed