Terminado

Software to "convert" (interpret/segmentate) complex pdf files into structured text files (Json)

Publicado el 30 Marzo, 2016 en Programación y Tecnología

Sobre este proyecto

Abierto

*Update: attached file "technicaldescription_and_pdfsamples" with pdfsamples and more detailed specification

the goal is to create a software/script to interpret/convert a complex pdf file into a text file (json). The script/software must be written in English and I must have complete access to the source files. ANY programming languages can be used (python, java, C/C++, ..
.).

There are several diferent types of pdf files and the software should be able to break the important information on the pdf file into a structured json file. So, the problem is to segmentate the information in the PDF files.

The PDF files are TESTs (math, science, informatics, etc) in portuguese. The ideia is to separate the important information so we can build a database with this information. The scope of this project is only to extract the informations from the pdf file and present in a json output.


The complexity is that the files are (very) different, many have figures and images that must be stored as well. I have a initial approach idea to solve the problem, but I'm open to discuss the problems and possible solutions as well. Detailed specification can be sent in case of interrest.
Im able to skype and explain everything.

The only skill needed is good programming skills and problem solving.

I'm very keen to help and discuss alternatives.

The job is not easy and price/time can be negotiate. I believe that good performance must be good rewarded ($$).

Thanks,

Categoría Programación y Tecnología
Subcategoría Aplicaciones de escritorio
¿Es un proyecto o una posición? No lo sé aún
Actualmente tengo Tengo las especificaciones
Disponibilidad requerida Según se necesite
Experiencia en este tipo de proyectos Sí (He administrado este tipo de proyectos anteriormente)
Plataformas requeridas Windows

Plazo de Entrega: No definido

Habilidades necesarias