Page MenuHomePhabricator

Load issue with pdfs
Closed, ResolvedPublic5 Estimated Story PointsBUG REPORT

Description

Follow up from T222932: Make error messages resulting from failed requests state clearly that it's a MediaWiki's fault not wsexport's -

For regression purposes I retested some PDFs which have caused us problems in the past. Either they generated fine or they had the same load issue as https://phabricator.wikimedia.org/P8588 (e.g. this file). It might be worth investigating the latter issue as I don't believe I have seen it before.

Event Timeline

Niharika triaged this task as Medium priority.Jun 12 2019, 11:37 PM
Niharika created this task.
Restricted Application changed the subtype of this task from "Task" to "Bug Report". · View Herald TranscriptJun 12 2019, 11:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Niharika renamed this task from [BUG] Load issue with pdfs to Load issue with pdfs .Jun 20 2019, 5:54 PM
Niharika changed the point value for this task from 0 to 5.

If there are errors about temporary files, T225966 may be related.

It looks like it's this (unresolved) Guzzle bug, but we can maybe work around it by using a request Pool (which by default is limited to 25 concurrent requests). I've made an example patch: https://github.com/wsexport/tool/pull/186

The example work above now compiles to epub correctly (it's over 12,000 pages, with 2,700 images!). I suspect we'll run into other resource limits when trying to turn it into PDF or anything.

This has been merged, and is live on the staging site ready for QA. The book Les_Merveilles_de_la_science still doesn't work though, because it times out; that's better than failing I guess.

...The book Les_Merveilles_de_la_science still doesn't work though, because it times out; that's better than failing I guess.

Same thing happens with me. I can generate the epub version of Les_Merveilles_de_la_science via the command line though.

Otherwise, retesting other ebooks which have caused problems in the past hasn't shown any regressions (although nor has it shown improvements).

As this change appears to affect images, I used epub validators (epubcheck and flightcrew) to check that none had missing images.

I also looked at a few in epub readers, to check if there were any obvious problems with the files generate.