Page MenuHomePhabricator

Unexpected TIFF verification failures when mass-uploading via API
Open, LowPublic

Description

During the batch upload of NYPL TIFF images (several thousand successful uploads so far), I have started getting a series of bad TIFF errors when attempting to batch upload NYPL's collection of American popular songs. This is odd because the TIFF looks okay when downloaded manually. It is doubly odd because I use Python Image Library's tiffinfo call to get information about the mode, size, compression, dpi and ICC profile and that works fine locally, yet it is a tiffinfo call at the WMF side that rejects the upload. Perhaps the tiffinfo exec needs an update or is this a problem with the source encoding and so they should not be uploaded?

I get a similar file verification error when attempting to upload my locally stashed TIFF, using Rillke's chunked uploader, presumably the same verification routine is rejecting the file.

Example file:
Source: NYPL digital collections 'All aboard for Podunk'
Error msg using upload via API:

Uploading file to commons:commons via API....
{u'servedby': u'mw1125', u'error': {u'info': u"This file did not pass file verification: The uploaded file contains errors: tiffinfo command failed: '/usr/bin/tiffinfo' '/tmp/r0fzoW' 2>&1", u'*': u'See http://commons.wikimedia.org/w/api.php for API usage', u'code': u'verification-error', u'details': [u'tiff_bad_file', u"tiffinfo command failed: '/usr/bin/tiffinfo' '/tmp/r0fzoW' 2>&1"]}}

Local use of tiffinfo with PIL gives:

Image mode: RGB 
Image size: (3274, 4543) 
Image info: compression raw 
Image info: dpi (300, 300) 
Image info: icc_profile KCMSmntrRGB XYZ acspMSFTKODAROMMKODAcprtHdescwtptrTRCgTRCbTRCrXYZgXYZbXYZdmndndmddmmodtextCopyright c Eastman Kodak Company 1999 all rights reserveddescProPhoto RGBProPhoto RGBProPhoto RGBXYZ curvXYZ 4IXYZ XYZ descKODAKKODAKKODAKdescReference Output Medium MetricROMM  Reference Output Medium MetricROMM  Reference Output Medium MetricROMM  mmod ...

Event Timeline

Fae created this task.Jan 25 2016, 3:32 PM
Fae raised the priority of this task from to Needs Triage.
Fae updated the task description. (Show Details)
Fae added a project: Commons.
Fae added a subscriber: Fae.
Restricted Application added subscribers: StudiesWorld, Steinsplitter, Aklapper. · View Herald TranscriptJan 25 2016, 3:32 PM
Fae updated the task description. (Show Details)Jan 25 2016, 3:38 PM
Fae set Security to None.
Aklapper renamed this task from Unexpected TIFF verification failures to Unexpected TIFF verification failures when mass-uploading via API.Jan 25 2016, 3:59 PM
Aklapper triaged this task as Low priority.
Fae added a comment.Jan 27 2016, 4:18 PM

Just in case anyone investigates this TIFF problem, as a third route to upload I tried pushing 8 sheets of the Podunk score through the GLAMwiki toolset. None uploaded. The errors look like:
''''{

    "logid": 149907345,
    "ns": 6,
    "title": "File:No title",
    "pageid": 0,
    "logpage": 0,
    "params": {
        "metadata-record-nr": 2,
        "message": "This file did not pass file verification.\noriginal URL: http://link.nypl.org/KkaiBy1rR0W2fOc2hoCLTwd\nevaluated URL: http://link.nypl.org/KkaiBy1rR0W2fOc2hoCLTwd"
    },
    "type": "gwtoolset",
    "action": "mediafile-job-failed",
    "user": "F\u00e6",
    "timestamp": "2016-01-27T16:09:57Z",
    "comment": "music test tranche"
},''''

The errors are more widespread for the NYPL music collection than I imagined. It may be the majority of the American popular songs collection (4,700 music scores). Again the TIFFs appear fine locally, and pass my tiffinfo check locally. They also render perfectly well on the NYPL's website.

TheDJ added a subscriber: TheDJ.EditedMay 2 2016, 1:01 PM

My own local tiffinfo command also fails on http://link.nypl.org/KkaiBy1rR0W2fOc2hoCLTwd

tiffinfo /Users/djhartman/nypl.tiff 
TIFF Directory at offset 0x8 (8)
  Subfile Type: (0 = 0x0)
  Image Width: 3274 Image Length: 4543
  Resolution: 300, 300 pixels/inch
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 4543
  Planar Configuration: single image plane
  Make: Epson   
  Model: Exp10000XL10000 
  Software: SilverFast 6.6.0r3a
  DateTime: 2008:12:19 10:38:50
  EXIFIFDOffset: 0x158
  ICC Profile: <present>, 940 bytes
TIFFReadCustomDirectory: Warning, Wrong data type 3 for "PixelXDimension"; tag ignored.
TIFFReadCustomDirectory: Warning, Wrong data type 3 for "PixelYDimension"; tag ignored.
TIFF Directory at offset 0x158 (344)
  MakerNote: 0x4c,0x53,0x49,0x31,0x0,0x8,0xc0,0x1,0x1,0x1,0x0,0x0,0x55,0x20,0xc0,0x2,0x0,0x1,0x0,0x0,0x0,0x20,0xc0,0x3,0x1,0x1,0x0,0x2,0x50,0x20,0xc0,0x4,0x1,0x1,0x0,0x1,0xab,0x20,0xc0,0x5,0x0,0x1,0x0,0x0,0x0,0x0,0xc0,0x6,0x0,0x1,0x0,0x0,0x0,0x0,0xc0,0x7,0x0,0x1,0x0,0x0,0x0,0x0,0xc0,0x8,0x1,0x1,0x0,0x2d,0xc6,0x0,0x0,0x0,0x0,0x0
TIFF Directory at offset 0x2a8e39a (44622746)
  Subfile Type: reduced-resolution image (1 = 0x1)
  Image Width: 1091 Image Length: 1514
  Resolution: 72, 72 pixels/inch
  Bits/Sample: 8
  Compression Scheme: None
  Photometric Interpretation: RGB color
  Samples/Pixel: 3
  Rows/Strip: 1514
  Planar Configuration: single image plane
  EXIFIFDOffset: 0xe476
  ICC Profile: <present>, 940 bytes
TIFFFetchDirectory: Sanity check on directory count failed, this is probably not a valid IFD offset.
TIFFReadCustomDirectory: Failed to read custom directory at offset 58486.

version:

tiffinfo -v
LIBTIFF, Version 4.0.6
Restricted Application added a subscriber: Poyekhali. · View Herald TranscriptMay 2 2016, 1:01 PM
Fae awarded a token.Dec 16 2019, 7:13 AM
Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptDec 16 2019, 7:13 AM
Fae added a comment.EditedDec 16 2019, 7:17 AM

Seven examples of this tiffinfo parsing failure are available at Uploads by Fæ which fail to display. These were recently put up for speedy deletion, but actually this still appears to be an old WMF server failure that still needs to be fixed, as re-uploading the TIFFs still leads to the indefinite hanging of the upload at the WMF side.

If the failure is down to an unexpected format in the NYPL TIFF, a workaround of some type is needed, even if this means manually editing each file before passing back to the WMF server. The TIFFs display perfectly well at the NYPL source and using client side editors like GIMP.

Attempting to re-upload this TIFF gives the following predictable error:

01487: FAILED: stashfailed: The upload is an exact duplicate of the current version of [[:File:Wâdy Taiyebeh, with the Red Sea in sight, The lower part of the valley is very picturesque. The horizontal strata of the cliffs and their bright coloring make a deep impression on one from (NYPL b10607452-80736).tiff]].