Page MenuHomePhabricator

English Wikisource OCR gadgets fails to identify text
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

What happens?:

  • No text generated, and the following error messages popped out respectively:
    • Tesseract: Error from the OCR tool: Image retrieval failed: HTTP/2 429 returned for <JPG link of the PDF page>
    • Google: Error from the OCR tool: The Google service returned an error: We can not access the URL currently. Please download the content and pass it in.
    • Transkribus: Error from the OCR tool: Error Code '500' :: Unable to complete request, try again!
  • For some cases, the text was properly generated, but multiple clicks were needed.

What should have happened instead?:
OCR applications should have operated normally in one or two click, and the text in the source PDF file should have been generated normally.

Other information (browser name/version, screenshots, etc.):