Create a wrapper API on Tool Labs to interact with Google Vision API
Closed, ResolvedPublic3 Estimated Story Points
Actions

Assigned To

Authored By

	• DannyH
	Aug 11 2016, 10:32 PM

Description

Create a wrapper API on Tool Labs to interact with the Google Vision API to perform OCR on images from WikiSource. It will need to use Bryan's Google API proxy since Google's APIs require IP address whitelisting (and Tool Labs sends requests from random nodes with different IP addresses).

Acceptance criteria:

The wrapper API should take the following input: language code, image URL.
It should accept requests only from wikisource.org.
Initially it should be limited to handling images that are no larger than 10 MB.
It should encode the image (base64 byte stream?), add the API credentials, set the language in the languageHints array, and submit the request as JSON to the API proxy.
The image should not be stored permanently anywhere on Tool Labs. (You can probably just read it into memory via curl.)
It should return the OCRed text or an error message if there was an error.

See Google Vision API documentation at https://cloud.google.com/vision/docs/ and Discovery URL at https://vision.googleapis.com/$discovery/rest?version=v1. See T140037#2507218 and T140037#2528369 for the format of the JSON data to submit. Ask @kaldari for the API key.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Tshrinivasan	T120788 Tool to use Google OCRs in Indic language Wikisource
		Resolved		Samwilson	T142768 Create a wrapper API on Tool Labs to interact with Google Vision API

Event Timeline

• DannyH created this task.Aug 11 2016, 10:32 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 11 2016, 10:32 PM

• DannyH added a parent task: T120788: Tool to use Google OCRs in Indic language Wikisource.Aug 11 2016, 10:32 PM

• DannyH mentioned this in T140037: Investigation: Tool to use Google OCR in Indic language Wikisources.Aug 11 2016, 10:38 PM

Bodhisattwa subscribed.Aug 12 2016, 3:47 AM

Yann subscribed.Aug 14 2016, 9:32 PM

kaldari raised the priority of this task from Medium to High.Aug 30 2016, 5:26 PM

kaldari subscribed.

This comment was removed by kaldari.

kaldari created subtask T144290: Allow the Labs Google API proxy to handle multiple Google APIs.Aug 30 2016, 5:31 PM

• DannyH edited projects, added Community-Tech-Sprint; removed Community-Tech.Aug 30 2016, 5:38 PM

• DannyH set the point value for this task to 3.

kaldari updated the task description. (Show Details)Aug 30 2016, 6:02 PM

kaldari closed subtask T144290: Allow the Labs Google API proxy to handle multiple Google APIs as Invalid.

kaldari updated the task description. (Show Details)Aug 30 2016, 9:01 PM

kaldari updated the task description. (Show Details)

kaldari renamed this task from Create a wrapper on Tool Labs to interact with Google OCR API to Create a wrapper on Tool Labs to interact with Google Vision API.Aug 30 2016, 9:10 PM

kaldari updated the task description. (Show Details)

kaldari updated the task description. (Show Details)Aug 30 2016, 9:13 PM

kaldari renamed this task from Create a wrapper on Tool Labs to interact with Google Vision API to Create a wrapper API on Tool Labs to interact with Google Vision API.Aug 30 2016, 11:14 PM

kaldari removed a subtask: T144290: Allow the Labs Google API proxy to handle multiple Google APIs.

kaldari updated the task description. (Show Details)Aug 30 2016, 11:21 PM

Samwilson subscribed.Aug 30 2016, 11:24 PM

kaldari updated the task description. (Show Details)Aug 30 2016, 11:25 PM

kaldari mentioned this in T144290: Allow the Labs Google API proxy to handle multiple Google APIs.Aug 30 2016, 11:45 PM

This task is partially blocked by T144290. It can probably be started without it though.

This looks like it could be a useful basis for the OCR API tool: https://github.com/thangman22/google-cloud-vision-php

It doesn't permit a different endpoint URL, so I've raised a pull-request to add that possibility.

I've created a fork of that library, and incorporated all pending PRs (except one with this in it).

I've got a draft of a basic tool that uses the fork: it is taking a Wikisource image URL, running the image through the API, and returning the text. It doesn't restrict requests to originating from Wikisource yet, nor the size of the file. The code is in D1966.

Nope, scratch that; someone will have to tell me how to use Diffusion properly! For now (it being beer o'clock on a Friday) the code lives at https://github.com/wikisource/ws-google-ocr :-)

• DannyH assigned this task to Samwilson.Sep 2 2016, 4:53 PM

• DannyH moved this task from Ready to In Development on the Community-Tech-Sprint board.

• DannyH removed a subscriber: Samwilson.

jayantanth subscribed.Sep 2 2016, 7:50 PM

(I've sorted out the repository confusion. The GitHub one has been replaced by the in-house one, and the former deleted.)

Still left to do for this ticket:

Only accept requests from wikisource.org (however it does limit image URLs to having the prefix https://upload.wikimedia.org/; is this sufficient?).
Limit image sizes to 10 MB (I'm adding this feature to the google-cloud-vision-php library).

Actually, it look like the Vision API is limited to 4 MB per image. Can we just rely on Google complaining when a sent image is too big, and pass that error back to the user (as is already done for all errors)?

Only accept requests from wikisource.org (however it does limit image URLs to having the prefix https://upload.wikimedia.org/; is this sufficient?).

Yes, I think that should be sufficient. We just need to make sure we are not providing a way for random people to use the Google API for free (at our expense).

Actually, it look like the Vision API is limited to 4 MB per image. Can we just rely on Google complaining when a sent image is too big, and pass that error back to the user (as is already done for all errors)?

I think it would be better to limit it on the input side so that we aren't encoding 100 MB files for no reason. Let's limit it to 4 MB in that case. See https://www.mediawiki.org/wiki/API:Imageinfo for how to get this information from the MediaWiki API.

If we have files (images) > 4MB, isn't it possible for us to rescale the image so it's less than that?
Maybe using something like this.

@Niharika: yup, but actually I can't find a single proofreadpage image that is over 4 MB. Maybe for super-small-print (has anyone ever tried scanning microfinch and feeding that into Wikisource?) but it seems rare.

The image that gets send to the OCR service is already the resized version, for example: https://upload.wikimedia.org/wikipedia/commons/thumb/6/64/All_the_Year_Round_-_Series_3_-_Volume_8.pdf/page11-912px-All_the_Year_Round_-_Series_3_-_Volume_8.pdf.jpg

Individual proofreading projects can set the size of these resized images, but even when they need a large one (for example, with newspapers like this: https://en.wikisource.org/wiki/Page:The_New_York_Times,_1900-12-01.djvu/1 ) it's only 1.9 MB.

@kaldari: It's not the size of the original image that matters so much as the size of the one displayed on the proofreading page. So do you think it's too inefficient to just read that one into memory and see how big it is? I figure it's not the best, but then a call to the API would be slowish too.

I'm fixing up the Vision API interface to default to not permitting big images; but perhaps there's a better way. One possibility is also for the gadget to not even send a request for a big image (not that that would stop people from using it in other scripts).

Right, limiting to 4 MB implemented.

So now the Vision API library is pretty much done (enough), and the tool that calls it is up and running (and pointing to https://vision.google-api-proxy.wmflabs.org/ with the key that Ryan gave me).

@Samwilson: Looks like the proxy is up and running now at http://googlevision-api-proxy.wmflabs.org/. Hope that works.

Also, can you add me as a maintainer for the ws-google-ocr project?

Looks like the proxy is up and running now at http://googlevision-api-proxy.wmflabs.org/. Hope that works.

Yup, looks to be good! e.g. http://tools.wmflabs.org/ws-google-ocr/api.php?image=https://upload.wikimedia.org/wikipedia/commons/thumb/0/07/Gissing_-_The_Nether_World%2C_vol._III%2C_1889.djvu/page46-1024px-Gissing_-_The_Nether_World%2C_vol._III%2C_1889.djvu.jpg

Also, can you add me as a maintainer for the ws-google-ocr project?

Done: http://tools.wmflabs.org/?list#toollist-ws-google-ocr

@Samwilson: Looks like there's some escaping going on:
http://tools.wmflabs.org/ws-google-ocr/api.php?lang=bn&image=https://upload.wikimedia.org/wikipedia/commons/2/21/Akash_Nila_JibonAnandoDas.png

Can we have it return the raw text in Unicode (with only the normal set of escaped chars: quotation mark, backslash, control characters)?

kaldari moved this task from Needs Review/Feedback to In Development on the Community-Tech-Sprint board.Sep 7 2016, 2:16 AM

Can we have it return the raw text in Unicode

Done.

The main code is at https://phabricator.wikimedia.org/diffusion/1966/browse/master/api.php

Reviewed.

kaldari moved this task from Needs Review/Feedback to In Development on the Community-Tech-Sprint board.Sep 8 2016, 1:10 AM

Thanks! I've replied on that commit.

You've probably already checked it out, but there's also the general Vision API library code that this tool is relying on:
https://github.com/wikisource/google-cloud-vision-php/

kaldari closed this task as Resolved.Sep 8 2016, 5:02 AM

kaldari moved this task from In Development to Q1 2018-19 on the Community-Tech-Sprint board.

• DannyH edited projects, added Community-Tech; removed Community-Tech-Sprint.Sep 13 2016, 11:30 PM

• DannyH moved this task from Needs Discussion to Archive on the Community-Tech board.Sep 13 2016, 11:41 PM

Jnanaranjan_sahu subscribed.Sep 19 2017, 5:05 AM

MusikAnimal added a project: Google-api-proxy.Apr 16 2020, 6:51 PM

Create a wrapper API on Tool Labs to interact with Google Vision APIClosed, ResolvedPublic3 Estimated Story PointsActions

Description

Related ObjectsSearch...

Event Timeline

Create a wrapper API on Tool Labs to interact with Google Vision API
Closed, ResolvedPublic3 Estimated Story Points
Actions

Related Objects
Search...