Realizado

Software to "convert" (interpret/segmentate) complex pdf files into structured text files (Json)

Publicado em 30 de Março de 2016 dias na TI e Programação

Sobre este projeto

Aberto

*Update: attached file "technicaldescription_and_pdfsamples" with pdfsamples and more detailed specification

the goal is to create a software/script to interpret/convert a complex pdf file into a text file (json). The script/software must be written in English and I must have complete access to the source files. ANY programming languages can be used (python, java, C/C++, ..
.).

There are several diferent types of pdf files and the software should be able to break the important information on the pdf file into a structured json file. So, the problem is to segmentate the information in the PDF files.

The PDF files are TESTs (math, science, informatics, etc) in portuguese. The ideia is to separate the important information so we can build a database with this information. The scope of this project is only to extract the informations from the pdf file and present in a json output.


The complexity is that the files are (very) different, many have figures and images that must be stored as well. I have a initial approach idea to solve the problem, but I'm open to discuss the problems and possible solutions as well. Detailed specification can be sent in case of interrest.
Im able to skype and explain everything.

The only skill needed is good programming skills and problem solving.

I'm very keen to help and discuss alternatives.

The job is not easy and price/time can be negotiate. I believe that good performance must be good rewarded ($$).

Thanks,

Categoria TI e Programação
Subcategoria Aplicativos desktop
Isso é um projeto ou uma posição de trabalho? Eu não sei ainda
Tenho, atualmente Eu tenho especificações
Disponibilidade requerida Conforme necessário
Experiência nesse tipo de projeto Sim (Eu já gerenciei esse tipo de projeto)
Plataformas exigidas Windows

Prazo de Entrega: Não estabelecido

Habilidades necessárias