Page MenuHomePhabricator

Install Kraken OCR (and web service) on a new Wikisource VPS
Closed, ResolvedPublic

Description

In preparation for adding the Kraken OCR engine as an option on ocr.wmcloud.org, a new VPS should be created and Kraken installed. The HTTP API service should also be installed, accessible at https://kraken-ocr.wmcloud.org

Steps:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

We need to request an increase of quota for the Wikisource project (currently 6 / 8 instances; 16 / 16 VCPUs; 32.0 GB / 34.1796875 GB RAM).

@sweil: what are the requirements for Kraken?

A virtual machine for tests with kraken should provide at least 4 VCPUs, 8 GiB RAM, 8 GB storage (minimum values). More VCPUs allow more parallel processing.

For Tesseract I calculate 1 GiB RAM per VCPU, and each VCPU can handle one OCR process.
For kraken 2 GiB RAM and 4 VCPUs per OCR process are better.

In addition, Linux and the web server need some RAM.

@Samwilson, it looks like Wikimedia OCR currently does not handle more than a single OCR process at the same time. Is that correct? Doesn't that cause much waiting if the service is used heavily? Did users complain about slow OCR because of that?

It's one process per request at the moment isn't it? But no, apart from the recent slowness with Transkribus no one's complained; I guess it's just quick enough, and doesn't get a vast amount of traffic.

sweil claimed this task.

Menwhile Kraken is installed and configured, the web service is online.