Page MenuHomePhabricator
Feed Advanced Search

Jun 24 2020

Mahastama added a comment to T199992: Javanese OCR installation steps for Wikisource.

Thanks guys, it's been up and running for the initial test. Apparently the module which we use to accept cross-domain API request has a very short valid period for each API key generated and caused the error mentioned before. We temporarily disable the validity period check at the moment.

FireShot Screen Capture #076 - 'Creating Halaman_PDIKM 700-09 Majalah Aboean Goeroe-Goeroe September 1931_pdf_3 - Wikisource' - id_wikisource_org.png (654×1 px, 222 KB)

The textbox is already showing the response from the Cakra OCR API. However there are some things still in question:

  1. We haven't tested it yet using a Wikisource document with Javanese script. I am asking the ID crew to point an example of available file.
  2. The sent data from the page is apparently in BLOB data type, while the API is designed to receive a JPEG file. We might have to take a look at the process and the data sent, if this will raise an issue on our side or not.
  3. The OCR result in Unicode jv is contained inside the span tag. Do we have to provide a response in plain text or is it possible to circumvent this so that only the result from inside the span tag is displayed on the textbox? The blank return itself is still within investigation, on whether it is caused by no Javanese script in the document or because the sent data is a BLOB.
Jun 24 2020, 6:12 AM · Privacy Engineering, ProofreadPage, I18n, All-and-every-Wikisource

Jun 2 2020

Mahastama added a comment to T199992: Javanese OCR installation steps for Wikisource.

Thanks guys for the enlightment @Aklapper, @Xover and @Tpt :) Me and my team have been trying for months to find the way, because we are unfamiliar with the Wikisource site structure (nobody ever worked using this before). Many thanks for @Tpt for making a working example; that's exactly what we need. Sorry for the "authorization" error, as one of my developer happened to change the API key, I'll take a look at it.
Well, if the tech community is considering to make a common interface for calling external OCR API's, we'll be glad to collaborate.

Jun 2 2020, 6:58 AM · Privacy Engineering, ProofreadPage, I18n, All-and-every-Wikisource

May 30 2020

Mahastama added a comment to T199992: Javanese OCR installation steps for Wikisource.

Our engine called Cakra OCR has been running in our experimental server https://trawaca.id/ocrjawa/?lang=en with a working example shown below:

chrono_02.png (1×1 px, 935 KB)

An API has also been set up, although we are still fixing bits of it. We've tested the API from an external server, for example from our friend's https://ferianto.id/twclient/sampleajax.php and it worked in receiving JPEG images of Javanese manuscript and returning the OCR result in Unicode.
We've stuck in how to make interface like this within the Wikisource:
chrono_03.png (1×1 px, 295 KB)

where window A will send the material through Cakra API and the resulting OCR will be send into window B, just like in our server.

May 30 2020, 4:59 AM · Privacy Engineering, ProofreadPage, I18n, All-and-every-Wikisource