Completed

Rest Api for reading how much color is in a Pdf / Ms-Doc File

Published on the April 04, 2020 in IT & Programming

About this project

Open

We are needing a Rest Api for reading pdf / ms doc (97-365) and knowing how much color each page have. For examples:

Example 1: if the document has in page 1,  2000 letters count and 100 letters have other color than black, then that page has 5% of color on that page (100 letters in color / 2000 letters on that page).

Example 2: if the document has a page 8.5 x 11 Inches, combined with letters and image, it has  2000 letters count and 100 letters have other color than black and have a image that covers 50% space of the  8.5x11 page and is not in Grayscale,  then that page has 55% of color on that page (100 letters in color / 2000 letters +  50% space of the page with a image that have some other color that grayscale on that page).

Note for the image scan, that isnt really important for this specific case to know how much color really have the image in the page, is only important to know that the image is not Grayscale and the percent that occupy the image for the whole page).

You can use M/L implementation (not required)  / Any programing Language that you prefer / Operative System. Prefered the Rest Api is Serveless (Not Required).

The Rest api input is a pdf / ms-doc (97-365) file, that will be located in a path on the same server or a: s3 bucket, onedrive, google drive, drop box or any other (recommendation accepted).

The Rest api Output is a Json that have the %of color, of each page.

Json result format:

{
  "Document": {
        "Name" : "Test.PDF"   
      ,"Pages": 6
  },
  "JobStatus": "SUCCEEDED",
  "Pages": [
        {"Page1": 20, "Page2": 0, "Page3": 50}
        ]
      }

Important note:
The solution must accept pages of any size. For the letters count for each page i dont think that this matters to much. But for the image size it does, since for knowing the %of color on the page, you have to compare the image size vs the page size to know how much color have the image on the page.


Source code will be needed. Documentation how to use and install the rest api will also be needed.

Project overview

We are needing a Rest Api for reading pdf / ms doc (97-365) and knowing how much color each page have. For examples: Example 1: if the document has in page 1, 2000 letters count and 100 letters have other color than black, then that page has 5% of color on that page (100 letters in color / 2000 letters on that page). Example 2: if the document has a page 8.5 x 11 Inches, combined with letters and image, it has 2000 letters count and 100 letters have other color than black and have a image that covers 50% space of the 8.5x11 page and is not in Grayscale, then that page has 55% of color on that page (100 letters in color / 2000 letters + 50% space of the page with a image that have some other color that grayscale on that page). Note for the image scan, that isnt really important for this specific case to know how much color really have the image in the page, is only important to know that the image is not Grayscale and the percent that occupy the image for the whole page). You can use M/L implementation (not required) / Any programing Language that you prefer / Operative System. Prefered the Rest Api is Serveless (Not Required). The Rest api input is a pdf / ms-doc (97-365) file, that will be located in a path on the same server or a: s3 bucket, onedrive, google drive, drop box or any other (recommendation accepted). The Rest api Output is a Json that have the %of color, of each page. Json result format: { "Document": { "Name" : "Test.PDF" ,"Pages": 6 }, "JobStatus": "SUCCEEDED", "Pages": [ {"Page1": 20, "Page2": 0, "Page3": 50} ] } Important note: The solution must accept pages of any size. For the letters count for each page i dont think that this matters to much. But for the image size it does, since for knowing the %of color on the page, you have to compare the image size vs the page size to know how much color have the image on the page. Source code will be needed. Documentation how to use and install the rest api will also be needed.

Category IT & Programming
Subcategory Other
Project size Small
Is this a project or a position? Project
I currently have I have specifications
Required availability As needed
API Integrations Cloud Storage (Dropbox, Google Drive, etc.), Other (Other APIs)

Delivery term: Not specified

Skills needed

Other projects posted by A. V.