Smartphone Document Capture & Detection Region

Published: 8 months ago
Method Efficiently detect and segment document regions in preview frames from captured camera's smartphones.

We believe an efficient capture process should be able to: (1)detect and segment the relevant document object during the preview phase; (3)(2)assess the quality of the capture conditions and help the user improve them; optionally(4) trigger the capture at the perfect moment.

It's focused on the first step of this process: efficiently detect and segment document regions, as illustrated by following video showing the ideal output for the preview phase of some acquisition session.

the input will be a set of videoclips containing a document from a predefined set, and the output should be an xml file containing the quadrilateral coordinates in which we can find the document per each frame of the video.

Based Project In Challeng-1 ICDAR Smartdoc:

01/22/2016 at 0:15 AST - Additional information submitted
The evaluation process will be greatly simplified and avoid having to optimize rejection rates.

It works as follows.

0/ For each frame of each video of the test set, our ground-truth contains the exact coordinate of each corner of the page object to segment. Each frame contains such page object: your methods should return 4 coordinates for each frame.

1/ Using the object size and its coordinates in each frame, we start by transforming the coordinate of your results S and of the ground-truth G to undo the perspective transform and obtain the corrected quadrangles S' and G'. By applying such transform we make all the evaluation measures comparable within the document referential.

2/ For each frame f, we will compute the Jaccard index (JI) as follows to measure the similarity between the set G' of expected pixels in the ground truth and the set S' of the segmentation result returned by your method:
JI(f) = area(intersection(G', S')) / area(union(G', S'))
S' will be considered empty if you reject a frame, giving the worst possible score (0) to it. Hence, your method should not reject frames.

3/ The overall score for each method will be the average of the frame score, for all frames in the dataset.



Results for attached videos help me recruit!!!

No application on a mobile platform is required. The only requirement is code worked, but it quick I can be embeddable in a mobile device.

No programming language is preferred. Feel free to use as many programming languages fit. The dataset (150 videos, 25,000 frames) available in,

I updated the project I attached three videos to the project. If you can find the bouding box document in the videos, please send me the result. This will help me in recruiting.

