Page MenuHomePhabricator

Fixable vs. unfixable IA Upload failures: overview
Open, Needs TriagePublic

Description

Browsing into 44 IA Upload failures I found 23 cases of probably fixable ones, and 20 cases of unfixable ones.

Among the former I found:

  • 14 cases where _jp2.zip file exists, but its prefix is different from IA ID;
  • 7 cases where there's no _jp2.zip file, but there's a _tif.zip file;
  • 1 case where there's no _jp2.zip file, but there's a _jp2.tar file;
  • 1 case where _jp2.zip file exists, but uploader exits with no result (it is a very large item with 1010 pages).

Among the latter (unfixable) I found a variety of abnormal uploads of files (.jpg, .png, .mp3, .ogg...) or of abnormal upload of folders or zip files with name structure different from the allowed one (_images.zip), lacking _djvu.xml file.

My suggestions (unluckily I can't fix code at all....) are:

  • to test for existence of a _djvu.xml as first step,
    • if it exists
      • to test for existence of a _jp2.zip file and to use it even if its prefix is different from IA ID
      • if it doesn't exists
        • to test for a _tif.zip file and to use it after a tif to jpg conversion
        • to test for a _jp2.tar file and to use it after a tar splitting

This approach should avoid most fixable IA Uploader failures.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 3 2017, 9:30 PM
Aklapper removed a subscriber: IA Upload.
Restricted Application added a project: Community-Tech. · View Herald TranscriptNov 3 2017, 9:33 PM
Alex_brollo updated the task description. (Show Details)Nov 3 2017, 9:44 PM
Alex_brollo updated the task description. (Show Details)Nov 4 2017, 5:23 AM

Is it the case that the zip files we want are identified by format = 'Abbyy GZ' in the files' list? That seems to identify the jp2 and tif zip files in the items I've looked at.