Page MenuHomePhabricator

DjVuHandler: getDimensionInfoFromMetaTree: PHP Notice: Undefined index: pages
Open, Needs TriagePublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   PHP Notice: Undefined index: pages
exception.trace
from /srv/mediawiki/php-1.38.0-wmf.7/includes/media/DjVuHandler.php(432)
#0 /srv/mediawiki/php-1.38.0-wmf.7/includes/media/DjVuHandler.php(432): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/mediawiki/php-1.38.0-wmf.7/includes/media/DjVuHandler.php(411): DjVuHandler->getDimensionInfoFromMetaTree(array)
#2 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/objectcache/wancache/WANObjectCache.php(1689): DjVuHandler->{closure}(boolean, integer, array, NULL, array)
#3 /srv/mediawiki/php-1.38.0-wmf.7/includes/libs/objectcache/wancache/WANObjectCache.php(1518): WANObjectCache->fetchOrRegenerate(string, integer, Closure, array, array)
#4 /srv/mediawiki/php-1.38.0-wmf.7/includes/media/DjVuHandler.php(413): WANObjectCache->getWithSetCallback(string, integer, Closure, array)
#5 /srv/mediawiki/php-1.38.0-wmf.7/includes/media/DjVuHandler.php(396): DjVuHandler->getDimensionInfo(LocalFile)
#6 /srv/mediawiki/php-1.38.0-wmf.7/includes/filerepo/file/LocalFile.php(981): DjVuHandler->getPageDimensions(LocalFile, integer)
#7 /srv/mediawiki/php-1.38.0-wmf.7/includes/media/ImageHandler.php(39): LocalFile->getWidth()
#8 /srv/mediawiki/php-1.38.0-wmf.7/includes/filerepo/file/File.php(885): ImageHandler->canRender(LocalFile)
#9 /srv/mediawiki/php-1.38.0-wmf.7/includes/filerepo/file/File.php(1193): File->canRender()
#10 /srv/mediawiki/php-1.38.0-wmf.7/includes/gallery/TraditionalImageGallery.php(129): File->transform(array)
#11 /srv/mediawiki/php-1.38.0-wmf.7/includes/CategoryViewer.php(531): TraditionalImageGallery->toHTML()
#12 /srv/mediawiki/php-1.38.0-wmf.7/includes/CategoryViewer.php(145): CategoryViewer->getImageSection()
#13 /srv/mediawiki/php-1.38.0-wmf.7/includes/page/CategoryPage.php(114): CategoryViewer->getHTML()
#14 /srv/mediawiki/php-1.38.0-wmf.7/includes/page/CategoryPage.php(66): CategoryPage->closeShowCategory()
#15 /srv/mediawiki/php-1.38.0-wmf.7/includes/actions/ViewAction.php(74): CategoryPage->view()
#16 /srv/mediawiki/php-1.38.0-wmf.7/includes/MediaWiki.php(538): ViewAction->show()
#17 /srv/mediawiki/php-1.38.0-wmf.7/includes/MediaWiki.php(320): MediaWiki->performAction(MediaWiki\Extension\CategoryTree\CategoryTreeCategoryPage, Title)
#18 /srv/mediawiki/php-1.38.0-wmf.7/includes/MediaWiki.php(925): MediaWiki->performRequest()
#19 /srv/mediawiki/php-1.38.0-wmf.7/includes/MediaWiki.php(559): MediaWiki->main()
#20 /srv/mediawiki/php-1.38.0-wmf.7/index.php(53): MediaWiki->run()
#21 /srv/mediawiki/php-1.38.0-wmf.7/index.php(46): wfIndexMain()
#22 /srv/mediawiki/w/index.php(3): require(string)
#23 {main}
Impact
Notes
  • Happening at the rate of about 20 per hour
  • Called from at least three different code paths
  • Always errors out in DjVuHandler->getDimensionInfoFromMetaTree

Event Timeline

Possibly triggered by the changes from T275268. Ping @Ladsgroup.

Dollars to dimes it's choking on c:File:Cyclopaedia of English Literature 1844 Volume 1 page 548.djvu which is in that category and is currently showing up with 0x0 dimensions and 0 pages. The djvudump structure for this files is:

djvudump
FORM:DJVU [67784] 
  INFO [10]         DjVu 2374x3642, v25, 400 dpi, gamma=2.2
  CIDa [36] 
  Sjbz [37347]      JB2 bilevel data
  FG44 [7863]       IW4 data #1, 100 slices, v1.2 (color), 198x304
  BG44 [1905]       IW4 data #1, 74 slices, v1.2 (color), 792x1214
  BG44 [2713]       IW4 data #2, 10 slices
  BG44 [348]        IW4 data #3, 4 slices
  BG44 [9825]       IW4 data #4, 9 slices
  TXTz [7656]       Hidden text (text, etc.)

Possible cause for this blowing something up is being a single page (most DjVu files are multi-page documents), meaning it starts FORM:DJVU instead of the more common FORM:DJVM. Neither djvutoxml nor djvutxt reveal anything particularly interesting or surprising.

Compare the above structure to a (randomly chosen) bundled DjVu file:

djvudump (bundled)
FORM:DJVM [14990725] 
  DIRM [3435]       Document directory (bundled, 438 files 438 pages)
  FORM:DJVU [73970] {bluefairybook00langiala_0001.djvu} [P1]
    INFO [10]         DjVu 2210x3814, v25, 500 dpi, gamma=2.2
    CIDa [36] 
    Sjbz [547]        JB2 bilevel data
    FG44 [348]        IW4 data #1, 100 slices, v1.2 (color), 185x318
    BG44 [3110]       IW4 data #1, 74 slices, v1.2 (color), 737x1272
    BG44 [8431]       IW4 data #2, 10 slices
    BG44 [21947]      IW4 data #3, 4 slices
    BG44 [39470]      IW4 data #4, 9 slices

Yup, great catch. It's the one-page part of the metadata that throws it off. The metadata is stored properly:

{"data":{"data":[{"height":3642,"width":2374,"dpi":400,"gamma":2.2}]},"blobs":{"text":"tt:610555279"}

I'll make a patch

Change 739930 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] media: Store metadata of one-page documents correctly

https://gerrit.wikimedia.org/r/739930

This patch would fix the storage, we still need to re-run the script for all files that have this issue. We can simply get the list from analytics cluster by a query like this:

select img_name from image where img_media_type = 'OFFICE' and img_major_mime = 'image' and img_metadata like '%{"data":{"data":[{"%' limit 5;

The total seems to be only 906 so far. Easily fixable.

Change 739930 merged by jenkins-bot:

[mediawiki/core@master] media: Store metadata of one-page documents correctly

https://gerrit.wikimedia.org/r/739930

Change 739838 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@wmf/1.38.0-wmf.9] media: Store metadata of one-page documents correctly

https://gerrit.wikimedia.org/r/739838

Change 739838 merged by jenkins-bot:

[mediawiki/core@wmf/1.38.0-wmf.9] media: Store metadata of one-page documents correctly

https://gerrit.wikimedia.org/r/739838

Mentioned in SAL (#wikimedia-operations) [2021-11-19T04:55:08Z] <ladsgroup@deploy1002> Synchronized php-1.38.0-wmf.9/includes/media/DjVuImage.php: Backport: [[gerrit:739838|media: Store metadata of one-page documents correctly (T296001)]] (duration: 00m 56s)

So running the script on that file, fixed the storage but the dimensions were in the cache. I had to flush the cache by this:

ladsgroup@mwmaint1002:~$ mwscript eval.php --wiki=commonswiki
> $cache = MediaWiki\MediaWikiServices::getInstance()->getMainWANObjectCache();

> $key = $cache->makeKey( 'file-djvu', 'dimensions', 'i9owbrcui1yvcf4b7f7kmc6tcopifhi' );

> $cache->delete( $key );

This is gonna be fun

Mentioned in SAL (#wikimedia-operations) [2021-11-23T03:37:57Z] <Amir1> rebuilding metadata of all djvu files outside of commons (T296001)

Mentioned in SAL (#wikimedia-operations) [2021-11-23T03:41:59Z] <Amir1> ladsgroup@mwmaint1002:~$ cat broken_imgs | xargs -I {} mwscript refreshImageMetadata.php --wiki=commonswiki --mediatype=OFFICE --verbose --mime 'image/*' --force --batch-size 1 --sleep 1 --start={} --end={} (T296001)

The scripts have been finished now, the only thing left is purging caches for broken case (we can't wait for it expire because they never do). I will look into it later.

Change 744068 had a related patch set uploaded (by Ladsgroup; author: Amir Sarabadani):

[mediawiki/core@master] media: Invalidate all file-djvu WAN caches

https://gerrit.wikimedia.org/r/744068