Page MenuHomePhabricator

failure in ia-upload possibly due to accented characters
Closed, ResolvedPublic3 Story Points

Description

https://tools.wmflabs.org/ia-upload/log/RepubliqueFrancaiseConstitution1848

log says

[2017-05-17 22:07:44] LOG.CRITICAL: Command "djvuxmlparser "/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/RepubliqueFrancaiseConstitution1848/République_Française_Constitution_1848_djvu.xml_new.xml" 2>&1" exited with code 1: Error: File '/mnt/nfs/labstore-secondary-tools-project/ia-upload/ia-upload/jobqueue/RepubliqueFrancaiseConstitution1848/R?publique_Fran?aise_Constitution_1848_djvu.xml_new.xml' does not exist.

which looks like something at IA or here has not dealt with the accented characters. OR As a newly created file it was still doing its business even though it said that it was all finished.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 19 2017, 11:57 AM
Samwilson triaged this task as Normal priority.May 19 2017, 12:45 PM
Samwilson edited projects, added Community-Tech; removed Community-Tech-Tool-Labs.
kaldari lowered the priority of this task from Normal to Low.Jun 14 2017, 12:01 AM
kaldari set the point value for this task to 3.
Samwilson added a subscriber: Samwilson.

I have not been able to replicate this error; I've been able to convert the above item without any problems.

I think you're right and it's something to do with how the derivative files are created at IA, or how they're listed in the files metadata file... there have been a few other items like this that have failed because IA Upload has acted before all the IA metadata is available. Note that the files metadata file of the above item was last modified at 17-May-2017 21:51, and the log entry was at 2017-05-17 22:07:44, i.e. about 17 minutes later... perhaps the OCR process runs longer than that, and the Abby XML isn't available for a while?

I've added a new check for the existence of the XML file, so people won't be able to create conversion jobs when both files aren't available.

Samwilson closed this task as Resolved.Nov 30 2017, 7:11 AM
Samwilson moved this task from Ready to Q1 2018-19 on the Community-Tech-Sprint board.
DannyH moved this task from Estimated to Archive on the Community-Tech board.Dec 19 2017, 1:14 AM