Page MenuHomePhabricator

Wikimedia OCR is not responding
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • Using "Transcribe text" button on Wikisource (already tried in Indonesian and English); or
  • Simply go to ocr.wmcloud.org

What happens?:
Wikimedia OCR is not responding

What should have happened instead?:
At least, the website is accessible

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):
Microsoft Edge
Version 122.0.2365.92

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It should be back up and running now. I'm going to keep this open for the moment while we investigate what went wrong and how to prevent it from happening again.

Thank you so much. A question though, if somehow this problem happens again, I just comment on this task right? (Don't have to create a new task).

tstarling closed this task as Resolved.EditedMar 27 2024, 3:30 AM
tstarling claimed this task.
tstarling subscribed.

I investigated this, but the cause was not obvious from the logs. It wasn't out of memory. If it happens again, I would suggest getting the following information before restarting apache:

ps -eo pid,wchan,rss,time,comm --forest
sudo lsof -nc apache2
sudo gcore -o /root/apache2.core $(pgrep -U www-data apache2 | head -n1)

Thank you so much. A question though, if somehow this problem happens again, I just comment on this task right? (Don't have to create a new task).

You can reopen the task with "Add Action > Change Status > Open" when you add your comment.

This may have been related to Transkribus slowing down. We loop while waiting for their response, which has recently been as high as 600 minutes.

I can set up a wall time limit, but it seems abusive to queue unlimited Transkribus jobs without any plans to check their responses.