Page MenuHomePhabricator

Proofread Page extension on Wikisource is displaying wrong pages; purge on Commons file fails
Open, Needs TriagePublic

Description

There was a missing page in this DJVU file, which I was in the process of transcribing on English Wikisource. So I repaired the file (using an image from an alternate scan of the same book) and uploaded the new version at Commons. Note this is a rather large file, 978 pages and 32 MB, which may be related to the problem.
https://commons.wikimedia.org/wiki/File:Portland,_Oregon,_its_History_and_Builders_volume_1.djvu
https://commons.wikimedia.org/w/index.php?title=File:Portland,_Oregon,_its_History_and_Builders_volume_1.djvu&page=618
https://commons.wikimedia.org/w/index.php?title=File:Portland,_Oregon,_its_History_and_Builders_volume_1.djvu&page=619

These are the two pages that are displaying incorrectly. The images that should show up should read "449" and "450" in the header of each page, respectively. But as of this writing (weeks after upload), they reflect "450" and <blank>, which are the previews from the earlier version of the Commons file.
https://en.wikisource.org/wiki/Page:Portland,_Oregon,_its_History_and_Builders_volume_1.djvu/618
https://en.wikisource.org/wiki/Page:Portland,_Oregon,_its_History_and_Builders_volume_1.djvu/619

I have purged all relevant pages at Wikisource, and tried to purge the main file at Commons, but after a minute or two I got an error "purge failed" (see attachment).

I believe the file on Commons needs to be purged in order to fix the problem with the preview in the Proofread Page view.

See relevant discussion here: https://en.wikisource.org/wiki/Wikisource:Scriptorium/Help/Archives/2019#How_to_update_the_display_of_a_DJVU_file?

{F28150036}

Event Timeline

This looks to me like MediaWiki-DjVu is choking on the new file and thus never regenerating the thumbnails / page images. Not obvious that ProofreadPage is involved in this at all (except that this is where it becomes most visible). I downloaded the actual .DjVu from Commons and confirmed that the file indeed contains the fixed pages.

What @Xover said makes sense -- it's not a problem with ProofreadPage, but a problem made *visible* by ProofreadPage. In that case, my initial title is incorrect, and may be preventing the right people from seeing this. Does anybody know what part of the software is supposed to generate thumbnails? Can somebody retitle this or recategorize it in a way that makes more sense? @Aklapper ?

Aklapper added a project: Thumbor.

I'm not sure if this is also handled by Thumbor - someone please correct me if I'm wrong.

For the records, I cannot access https://phabricator.wikimedia.org/F28150036 : "You do not have permission to view this object. @Peteforsyth can take this action."

Thank you @Aklapper ! I did not realize it was possible to create private image files here. I believe I have now updated the image to allow all to view. Please let me know if you still have problems (and I will watch out for this in the future).

@Peteforsyth: Thanks, I can see it now :)
And yes, image file permissions are a bit complicated, see https://www.mediawiki.org/wiki/Phabricator/Help#Uploading_file_attachments

@Peteforsyth, I quite often encounter this problem while uploading a new version of a multi-page file with thousands of thumbnails (generally files with >500 pages). See eg T206190 or T214759 .
I know 3 workarounds to deal with this problem in Wikisource (when this already happen):

  1. change scan with in index after upload a new version (if they are 1px wider/narrower new thumbnails must be generated
  2. use js to replace thumbnails for specific pages with thumbnails with other names; see example in pl.ws MadiaWiki:Commons.js :

https://pl.wikisource.org/w/index.php?title=MediaWiki:Common.js&diff=prev&oldid=2188790
(useful for few already existing pages)

  1. upload the file locally

From my experience, the outdated thumbnails seem to disapear after few weeks to half a year.

To avoid this problem, I often try to purge the file (few times if it fails) prior to upload a new version. If the purge fails you may be almost 100% sure to encounter the problem.

Gilles subscribed.

This seems to be an issue with the thumbnail purge. Possibly due to the large amount of thumbnails associated with a file that has hundreds of pages. Thumbor, when requested with new sizes (i.e. when the old cached versions aren't served), renders the expected pages.

I tried purging the file right now with action=purge and get a "maintenance" error, further suggesting a purge problem.

On frwikisource, we repaired dozens of files with missing pages... no purge of pages, books or files on commons works...

and the problem is only apparent for users who opened the pages before the repairing... because it is not caused by page or file cache, but by the browser cache (Firefox, IE, Chrome, Safari all give that symptom)

in each case, it was easily solved by purging the cache of the web browser (i.e. deleting the stored images of the pages).

On frwikisource, we repaired dozens of files with missing pages... no purge of pages, books or files on commons works...

and the problem is only apparent for users who opened the pages before the repairing... because it is not caused by page or file cache, but by the browser cache (Firefox, IE, Chrome, Safari all give that symptom)

in each case, it was easily solved by purging the cache of the web browser (i.e. deleting the stored images of the pages).

If we get HTTP error while attempting to purge, then it is definitely not a browser cache problem. Sometimes the problem disappears automatically after few months without any action (like in this case), and it appears only on files that already have thousands of thumbnails generated.
I never observed this problem if the new version was uploaded soon after the previous one (so not many thumbnaild appeared already) nor on files with small number os pages (below 500).
But necessity to wait few months to have a file cache fixed is definitely very annoying.

Hmm. Something is wonky here…

Rechecking now using the purgetab.js Gadget on Commons (which is just the Purge action snippet), I get a mw.notify( 'Purge failed', {type: 'error'}); which means the thenable returned from mw.api failed. But trying to reproduce in the API Sandbox I get:

{
    "batchcomplete": true,
    "purge": [
        {
            "ns": 6,
            "title": "File:Portland, Oregon, its History and Builders volume 1.djvu",
            "purged": true
        }
    ]
}

Which is a normal (successful) return.

The underlying problem is probably down in the file handling stack somewhere, but it looks like the API is maybe not handling whatever failure it's getting particularly well. And without surfacing the error from further down the stack it's impossible to pinpoint what component is actually choking here, much less why. @Legoktm Does that assessment look right? Is it worth filing this as a separate task against MediaWiki-Action-API for this?

The underlying problem is probably down in the file handling stack somewhere, but it looks like the API is maybe not handling whatever failure it's getting particularly well. And without surfacing the error from further down the stack it's impossible to pinpoint what component is actually choking here, much less why. @Legoktm Does that assessment look right? Is it worth filing this as a separate task against MediaWiki-Action-API for this?

Yes, though the problem is even deeper than the API. The API just calls WikiPage::doPurge() or WikiFilePage::doPurge(), which is documented as returning true or false. However the API doesn't check for a false return, which is a bug.

But if you look at the implementation of WikiFilePage::doPurge() (https://gerrit.wikimedia.org/g/mediawiki/core/+/df2d6fb857498e0f8e9dbb913bb9804d51a1d6bf/includes/page/WikiFilePage.php#180), it doesn't really check if any of those operations fail either. And that problem is pervasive up the tree, e.g. LocalFile::purgeThumbList() doesn't return anything despite calling FileRepo::quickPurgeBatch() which does return a Status.

So if someone wanted, fixing up the stack to get everything to return a Status, and then adding a doPurgeStatus which returns a Status, and deprecating doPurge, so the API can get a real error message. I'd be happy to review a series of changes that implemented this.

That said...the real problem is clearly that MediaWiki or Thumbor or Varnish is not able to purge all the thumbnails. If someone has an example from the past 7 days it might be possible to look through the logs to get an idea of where the failure is happening (or at least rule out some stuff).

That said...the real problem is clearly that MediaWiki or Thumbor or Varnish is not able to purge all the thumbnails. If someone has an example from the past 7 days it might be possible to look through the logs to get an idea of where the failure is happening (or at least rule out some stuff).

I don't have any files where the thumbnails are currently wrong (maybe someone else subscribed here has?), but the file mentioned in the description returned a failure when purging through the API earlier today (it succeeds now); and I just tested with the file from T206190 and got a failure. Presuming we don't have stale noise hanging around from older operations tripping up the API, whatever fails should have generated a log message somewhere for these.

That said...the real problem is clearly that MediaWiki or Thumbor or Varnish is not able to purge all the thumbnails. If someone has an example from the past 7 days it might be possible to look through the logs to get an idea of where the failure is happening (or at least rule out some stuff).

I don't have any files where the thumbnails are currently wrong (maybe someone else subscribed here has?), but the file mentioned in the description returned a failure when purging through the API earlier today (it succeeds now); and I just tested with the file from T206190 and got a failure. Presuming we don't have stale noise hanging around from older operations tripping up the API, whatever fails should have generated a log message somewhere for these.

This thumbnail is outdated:
https://upload.wikimedia.org/wikipedia/commons/thumb/a/a7/Karol_May_-_Old_Surehand_06.djvu/page94-878px-Karol_May_-_Old_Surehand_06.djvu.jpg
And purge of this file:
https://commons.wikimedia.org/wiki/File:Karol_May_-_Old_Surehand_06.djvu
does not refresh it.

Most of the logs for this file seem to be associated with request ID f093c198-321b-4a21-a88d-b2ee52512f7a.

In the FileOperation log, https://people.wikimedia.org/~legoktm/builders_volume_1_again.txt is all the entries matching that reqID. There are like 5000 log lines, so I haven't looked through it closely yet but this stuck out at me:

legoktm@mwlog1002:~$ grep -c "HTTP start: DELETE" builders_volume_1_again.txt 
2534
legoktm@mwlog1002:~$ grep -c "HTTP complete: DELETE" builders_volume_1_again.txt 
2383

first log entry is at 06:33:39, last is at 06:34:20, or ~40 seconds, so unlikely to be a timeout (in this part at least).

Then grep "HTTP complete: DELETE" builders_volume_1_again.txt | grep -v code=204 finds nothing, so none of the requests to swift obviously failed.

Hmm. No smoking gun there that I can find.

I hacked up a quick script to parse through it looking for oddities, and the only thing I see so far is that there are a significant number of thumbs for which there are more delete requests than there are responses (the 151 delta from the grep above).

output
Total number of pages: 847
Total number of thumbs: 1263
Total number of ops: 4917
Total balance: 151

   1:  110 |    2 | 53,80,85,93,103,107,120,133,160,180,200,267,320,400,512,600,640,682,683,800,1024,1200,1280,1747
   2:   19 |    1 | 180,270,360,378,1024
   3:   11 |    1 | 180,378,1024
   4:   31 |    1 | 151,180,302,378,484,645,1024,2307
   7:   10 |    2 | 180,378,1024
   9:   23 |    1 | 220,330,400,440,756,1024
  10:   15 |    1 | 180,270,360,1024
  14:    7 |    1 | 1024,1747
  17:    7 |    1 | 1024,1747
  25:    7 |    1 | 1024,1747
  30:    7 |    1 | 1024,1747
  37:    7 |    1 | 1024,1747
  50:    7 |    1 | 1024,1747
  55:    7 |    1 | 1024,1747
  58:    7 |    1 | 1024,1747
  76:    3 |    1 | 1024
  85:   19 |    1 | 75,113,151,1024,1747
  93:   11 |    1 | 360,1024,1747
 109:    7 |    1 | 1024,1747
 112:    3 |    1 | 1024
 120:    7 |    1 | 1024,1747
 129:    3 |    1 | 1024
 144:    7 |    1 | 1024,1747
 152:    3 |    1 | 1024
 153:    6 |    2 | 1024,1747
 154:    7 |    1 | 1024,1747
 163:    7 |    1 | 1024,1747
 164:    7 |    1 | 1024,1747
 165:    7 |    1 | 1024,1747
 168:    7 |    1 | 1024,1747
 174:    7 |    1 | 1024,1747
 176:    7 |    1 | 1024,1747
 183:    7 |    1 | 1024,1747
 187:    3 |    1 | 1024
 194:    7 |    1 | 1024,1747
 203:    7 |    1 | 1024,1747
 206:    3 |    1 | 1024
 222:    7 |    1 | 1024,1747
 232:    7 |    1 | 1024,1747
 234:    7 |    1 | 1024,1747
 254:    7 |    1 | 1024,1747
 255:    7 |    1 | 1024,1747
 263:    3 |    1 | 1024
 271:    7 |    1 | 1024,1747
 277:    3 |    1 | 1024
 281:    7 |    1 | 1024,1747
 288:    7 |    1 | 1024,1747
 294:   11 |    1 | 180,1024,1747
 295:    6 |    2 | 1024,1747
 306:    3 |    1 | 1024
 319:    3 |    1 | 1024
 323:    3 |    1 | 1024
 326:    3 |    1 | 1024
 330:    3 |    1 | 1024
 333:    7 |    1 | 1024,1747
 345:    2 |    2 | 1024
 346:    6 |    2 | 1024,1747
 350:    3 |    1 | 1024
 354:    7 |    1 | 1024,1747
 365:    6 |    2 | 1024,1747
 373:    7 |    1 | 1024,1747
 375:    6 |    2 | 1024,1747
 393:    2 |    2 | 1024
 395:    3 |    1 | 1024
 397:    7 |    1 | 1024,1747
 400:    3 |    1 | 1024
 411:    3 |    1 | 1024
 423:    7 |    1 | 40,1024
 425:    3 |    1 | 1024
 429:    7 |    1 | 1024,1747
 436:    7 |    1 | 1024,1747
 439:    3 |    1 | 1024
 460:    7 |    1 | 1024,1747
 466:    3 |    1 | 1024
 467:    3 |    1 | 1024
 489:    3 |    1 | 1024
 496:    7 |    1 | 1024,1747
 500:    3 |    1 | 1024
 511:    7 |    1 | 1024,1747
 516:    7 |    1 | 1024,1747
 523:    3 |    1 | 1024
 539:    3 |    1 | 1747
 554:    7 |    1 | 1024,1747
 558:    3 |    1 | 1024
 559:    3 |    1 | 1024
 562:    2 |    2 | 1024
 576:    7 |    1 | 1024,1747
 585:    4 |    4 | 1024,1747
 587:    7 |    1 | 1024,1747
 588:    7 |    1 | 1024,1747
 594:    7 |    1 | 1024,1747
 597:    3 |    1 | 1747
 612:    7 |    1 | 1024,1747
 617:   15 |    1 | 270,360,567,1024
 618:   27 |    1 | 180,270,360,695,926,1024,1747
 619:   23 |    1 | 270,360,378,567,756,1024
 628:    7 |    1 | 1024,1747
 632:   19 |    1 | 40,60,80,1024,1747
 671:    3 |    1 | 1024
 684:    7 |    1 | 1024,1747
 689:    3 |    1 | 1024
 692:    7 |    1 | 1024,1747
 705:    3 |    1 | 1024
 706:    7 |    1 | 1024,1747
 720:    6 |    2 | 1024,1747
 721:    6 |    2 | 1024,1747
 723:    7 |    1 | 1024,1747
 732:    7 |    1 | 1024,1747
 744:    3 |    1 | 1024
 777:    3 |    1 | 1024
 779:    7 |    1 | 1024,1747
 783:    7 |    1 | 1024,1747
 784:    7 |    1 | 1024,1747
 786:    7 |    1 | 1024,1747
 804:    9 |    3 | 40,60,1024
 806:    7 |    1 | 1024,1747
 819:   11 |    1 | 378,1024,1747
 834:    7 |    1 | 1024,1747
 836:    7 |    1 | 1024,1747
 860:    2 |    2 | 1024
 867:    7 |    1 | 1024,1747
 868:    3 |    1 | 1024
 881:    3 |    1 | 1024
 882:    7 |    1 | 1024,1747
 892:    7 |    1 | 1024,1747
 902:    7 |    1 | 1024,1747
 909:    7 |    1 | 1024,1747
 910:    6 |    2 | 1024,1747
 914:    7 |    1 | 1024,2134
 945:    3 |    1 | 1024
 956:    3 |    1 | 1024
 958:    3 |    1 | 1024

The logs related to one thumb (1024px for p. 958) look like this:

logs for p. 958
FileOperation.log-20211118.gz:2021-11-17 06:33:51 [f093c198-321b-4a21-a88d-b2ee52512f7a] mw1382 commonswiki 1.38.0-wmf.7 FileOperation DEBUG: HTTP start: DELETE https://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-thumb.31/3/31/Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu/page958-1024px-Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu.jpg {"method":"DELETE","url":"https://ms-fe.svc.eqiad.wmnet/v1/AUTH_mw/wikipedia-commons-local-thumb.31/3/31/Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu/page958-1024px-Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu.jpg"} 
FileOperation.log-20211118.gz:2021-11-17 06:34:17 [f093c198-321b-4a21-a88d-b2ee52512f7a] mw1382 commonswiki 1.38.0-wmf.7 FileOperation DEBUG: HTTP start: DELETE https://ms-fe.svc.codfw.wmnet/v1/AUTH_mw/wikipedia-commons-local-thumb.31/3/31/Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu/page958-1024px-Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu.jpg {"method":"DELETE","url":"https://ms-fe.svc.codfw.wmnet/v1/AUTH_mw/wikipedia-commons-local-thumb.31/3/31/Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu/page958-1024px-Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu.jpg"} 
FileOperation.log-20211118.gz:2021-11-17 06:34:18 [f093c198-321b-4a21-a88d-b2ee52512f7a] mw1382 commonswiki 1.38.0-wmf.7 FileOperation DEBUG: HTTP complete: DELETE https://ms-fe.svc.codfw.wmnet/v1/AUTH_mw/wikipedia-commons-local-thumb.31/3/31/Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu/page958-1024px-Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu.jpg code=204 size=0 total=0.126233 connect=1.8E-5 {"method":"DELETE","url":"https://ms-fe.svc.codfw.wmnet/v1/AUTH_mw/wikipedia-commons-local-thumb.31/3/31/Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu/page958-1024px-Portland%2C_Oregon%2C_its_History_and_Builders_volume_1.djvu.jpg","response_code":204,"size":0.0,"total_time":"0.126233","connect_time":"1.8E-5"}

So there's a DELETE request at 06:33:51 for which there is no response in evidence. Then there's a new request at 06:34:17 (~25s later) that is followed by a 204 response at 06:34:18 (~1s later). It is possible one of the requests failed but either there was no logging for the failure or the log was not tagged with the request ID. It is also conceivable that the first request succeeded, but was too slow, so a second request was sent and was coalesced with the first. But then there's this sequence for p. 839 (trimmed for clarity):

p. 839
06:33:49 start: DELETE page839-1024px
06:33:49 start: DELETE page839-1747px
06:33:49 start: DELETE page839-360px
06:33:49 start: DELETE page839-540px
06:33:49 start: DELETE page839-720px 
06:33:49 complete: DELETE page839-1024px
06:33:49 complete: DELETE page839-1747px
06:33:49 complete: DELETE page839-360px
06:33:49 complete: DELETE page839-540px
06:33:49 complete: DELETE page839-720px
06:34:13 start: DELETE page839-1024px
06:34:13 start: DELETE page839-1747px
06:34:13 start: DELETE page839-360px
06:34:13 start: DELETE page839-540px
06:34:13 start: DELETE page839-720px
06:34:15 complete: DELETE page839-1024px
06:34:15 complete: DELETE page839-1747px
06:34:15 complete: DELETE page839-360px
06:34:15 complete: DELETE page839-540px
06:34:15 complete: DELETE page839-720px

Why there are two requests for each thumbnail when the first set apparently gets successful responses is not obvious.

I also note that there is a shortfall in the total number of pages shown in the logs to be touched: the actual file has 978 pages, but the logs only show 847 being processed (a delta of 131). The pages not appearing in the logs are:

Unprocessed pages
42, 43, 49, 62, 63, 65, 74, 77, 87, 88, 89, 91, 94, 95, 113, 372, 374, 376, 377, 379, 380, 381, 382, 383, 384, 385, 386, 399, 404, 405, 406, 410, 437, 438, 445, 447, 448, 456, 468, 469, 481, 482, 483, 486, 488, 491, 495, 501, 505, 530, 540, 542, 547, 549, 553, 567, 589, 598, 625, 627, 629, 645, 647, 649, 650, 651, 660, 661, 663, 665, 666, 675, 676, 679, 683, 685, 687, 698, 699, 712, 713, 714, 715, 716, 717, 718, 719, 739, 740, 743, 745, 749, 750, 751, 752, 753, 761, 762, 764, 765, 780, 781, 782, 787, 793, 794, 811, 815, 816, 823, 824, 829, 837, 840, 841, 859, 861, 862, 864, 865, 866, 873, 880, 912, 913, 922, 927, 974, 976, 977, 978

I can't spot any obvious pattern to the missing pages. I also can't think of any obvious reasons why those pages would not have thumbnails at all.