Page MenuHomePhabricator

Create a wrapper API on Tool Labs to interact with Google Vision API
Closed, ResolvedPublic3 Estimated Story Points

Description

Create a wrapper API on Tool Labs to interact with the Google Vision API to perform OCR on images from WikiSource. It will need to use Bryan's Google API proxy since Google's APIs require IP address whitelisting (and Tool Labs sends requests from random nodes with different IP addresses).

Acceptance criteria:

  • The wrapper API should take the following input: language code, image URL.
  • It should accept requests only from wikisource.org.
  • Initially it should be limited to handling images that are no larger than 10 MB.
  • It should encode the image (base64 byte stream?), add the API credentials, set the language in the languageHints array, and submit the request as JSON to the API proxy.
  • The image should not be stored permanently anywhere on Tool Labs. (You can probably just read it into memory via curl.)
  • It should return the OCRed text or an error message if there was an error.

See Google Vision API documentation at https://cloud.google.com/vision/docs/ and Discovery URL at https://vision.googleapis.com/$discovery/rest?version=v1. See T140037#2507218 and T140037#2528369 for the format of the JSON data to submit. Ask @kaldari for the API key.

Event Timeline

kaldari raised the priority of this task from Medium to High.Aug 30 2016, 5:26 PM
kaldari subscribed.
This comment was removed by kaldari.
DannyH set the point value for this task to 3.
kaldari updated the task description. (Show Details)
kaldari renamed this task from Create a wrapper on Tool Labs to interact with Google OCR API to Create a wrapper on Tool Labs to interact with Google Vision API.Aug 30 2016, 9:10 PM
kaldari updated the task description. (Show Details)
kaldari renamed this task from Create a wrapper on Tool Labs to interact with Google Vision API to Create a wrapper API on Tool Labs to interact with Google Vision API.Aug 30 2016, 11:14 PM

This task is partially blocked by T144290. It can probably be started without it though.

This looks like it could be a useful basis for the OCR API tool: https://github.com/thangman22/google-cloud-vision-php

It doesn't permit a different endpoint URL, so I've raised a pull-request to add that possibility.

I've created a fork of that library, and incorporated all pending PRs (except one with this in it).

I've got a draft of a basic tool that uses the fork: it is taking a Wikisource image URL, running the image through the API, and returning the text. It doesn't restrict requests to originating from Wikisource yet, nor the size of the file. The code is in D1966.

Nope, scratch that; someone will have to tell me how to use Diffusion properly! For now (it being beer o'clock on a Friday) the code lives at https://github.com/wikisource/ws-google-ocr :-)

(I've sorted out the repository confusion. The GitHub one has been replaced by the in-house one, and the former deleted.)

Still left to do for this ticket:

  • Only accept requests from wikisource.org (however it does limit image URLs to having the prefix https://upload.wikimedia.org/; is this sufficient?).
  • Limit image sizes to 10 MB (I'm adding this feature to the google-cloud-vision-php library).

Actually, it look like the Vision API is limited to 4 MB per image. Can we just rely on Google complaining when a sent image is too big, and pass that error back to the user (as is already done for all errors)?

Only accept requests from wikisource.org (however it does limit image URLs to having the prefix https://upload.wikimedia.org/; is this sufficient?).

Yes, I think that should be sufficient. We just need to make sure we are not providing a way for random people to use the Google API for free (at our expense).

Actually, it look like the Vision API is limited to 4 MB per image. Can we just rely on Google complaining when a sent image is too big, and pass that error back to the user (as is already done for all errors)?

I think it would be better to limit it on the input side so that we aren't encoding 100 MB files for no reason. Let's limit it to 4 MB in that case. See https://www.mediawiki.org/wiki/API:Imageinfo for how to get this information from the MediaWiki API.

If we have files (images) > 4MB, isn't it possible for us to rescale the image so it's less than that?
Maybe using something like this.

@Niharika: yup, but actually I can't find a single proofreadpage image that is over 4 MB. Maybe for super-small-print (has anyone ever tried scanning microfinch and feeding that into Wikisource?) but it seems rare.

The image that gets send to the OCR service is already the resized version, for example: https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/All_the_Year_Round_-_Series_3_-_Volume_8.pdf/page11-912px-All_the_Year_Round_-_Series_3_-_Volume_8.pdf.jpg

Individual proofreading projects can set the size of these resized images, but even when they need a large one (for example, with newspapers like this: https://en.wikisource.org/wiki/Page:The_New_York_Times,_1900-12-01.djvu/1 ) it's only 1.9 MB.

@kaldari: It's not the size of the original image that matters so much as the size of the one displayed on the proofreading page. So do you think it's too inefficient to just read that one into memory and see how big it is? I figure it's not the best, but then a call to the API would be slowish too.

I'm fixing up the Vision API interface to default to not permitting big images; but perhaps there's a better way. One possibility is also for the gadget to not even send a request for a big image (not that that would stop people from using it in other scripts).

Right, limiting to 4 MB implemented.

So now the Vision API library is pretty much done (enough), and the tool that calls it is up and running (and pointing to https://vision.google-api-proxy.wmflabs.org/ with the key that Ryan gave me).

@Samwilson: Looks like the proxy is up and running now at http://googlevision-api-proxy.wmflabs.org/. Hope that works.

Also, can you add me as a maintainer for the ws-google-ocr project?

@Samwilson: Looks like there's some escaping going on:
http://tools.wmflabs.org/ws-google-ocr/api.php?lang=bn&image=https://upload.wikimedia.org/wikipedia/commons/2/21/Akash_Nila_JibonAnandoDas.png

Can we have it return the raw text in Unicode (with only the normal set of escaped chars: quotation mark, backslash, control characters)?

Thanks! I've replied on that commit.

You've probably already checked it out, but there's also the general Vision API library code that this tool is relying on:
https://github.com/wikisource/google-cloud-vision-php/

kaldari moved this task from In Development to Q1 2018-19 on the Community-Tech-Sprint board.