Page MenuHomePhabricator

How long are rendered PDFs cached?
Closed, ResolvedPublic

Description

Currently, when using the ElectronPdf-Service for rendering the Berlin article from the German Wikipedia I'm getting an old version of the PDF, without e.g. the style fixes for not expanding URLs applied (see the expanded GeoHack URL in the example PDF).
How does the caching strategy for the service look like, e.g. how long are rendered PDFs stored and when/how do they get purged/invalidated?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 8 2017, 10:28 AM
Tobi_WMDE_SW moved this task from Incoming to Tables in pdfs on the TCB-Team board.

As a guess these pdfs are purged when a new page revision is created.
As we changed the CSS that electron is served new PDFs now have the CSS but the old cached ones dont.
An edit to the page would probably purge the PDF.

Doing an action=purge on the page doesn't make its way to restbase.
Should it in this case?
Is there another way we can purge all pdfs held by the service?

The PDF files are not stored in RESTBase and only cached in Varnish for 1 minute ("cache-control": "s-maxage=600, max-age=600") and never purged.

Not sure if the electron service does any internal caching.

In that case it sounds like there must be caching within the electron service itself, as the old pdfs are lasting much longer than 1 min!

Tobi_WMDE_SW added a comment.EditedFeb 8 2017, 5:24 PM

Hm.. after looking at this again, I think it's not an issue with caching.
At least as far as I can tell the PDF created for https://de.wikipedia.org/wiki/Berlin is a pretty recent version. So I was most probably wrong in giving caching the fault for the expanded Geohack URLs.
Checking this again reveals that suppressing the expansion is not working for Berlin but also not for other pages like e.g. Hamburg. Strangely everything is fine for Leipzig.
Something is wrong but it is most probably not caching, so this task might be invalid.

@Addshore, are the style fixes delivered as part of the normal printable view? If they are only delivered with a specific parameter, then we'll need to change the URL passed to the electron render service to pull them in.

The PDF files are not stored in RESTBase and only cached in Varnish for 1 minute ("cache-control": "s-maxage=600, max-age=600") and never purged.

Ouch, not 1 minute, 10 minutes. 600 seconds is 10 minutes :) Just tested and everything works fine.

@Addshore, are the style fixes delivered as part of the normal printable view? If they are only delivered with a specific parameter, then we'll need to change the URL passed to the electron render service to pull them in.

They are delivered via the normal printable view based on the useragent.
https://phabricator.wikimedia.org/diffusion/EEPS/browse/master/src/ElectronPdfServiceHooks.php;941096bedfd23ff3c10e5ccb98490e905d474cd9$69

Hm.. after looking at this again, I think it's not an issue with caching.

We currently have these 2 examples, one that appears to have a cached copy of the Berlin PDF:
https://de.wikipedia.org/w/index.php?title=Spezial:ElectronPdf&page=Berlin&action=show-selection-screen&coll-download-url=%2Fw%2Findex.php%3Ftitle%3DSpezial%3ABuch%26bookcmd%3Drender_article%26arttitle%3DBerlin%26returnto%3DBerlin%26oldid%3D162285623%26writer%3Drdf2latex
And the "United Kingdom" page which has the same geohack element but was cached after the style change and thus does not contain the long link
https://de.wikipedia.org/w/index.php?title=Spezial:ElectronPdf&page=Vereinigtes+K%C3%B6nigreich&action=show-selection-screen&coll-download-url=%2Fw%2Findex.php%3Ftitle%3DSpezial%3ABuch%26bookcmd%3Drender_article%26arttitle%3DVereinigtes%2BK%25C3%25B6nigreich%26returnto%3DVereinigtes%2BK%25C3%25B6nigreich%26oldid%3D162221420%26writer%3Drdf2latex

I can also confirm when I set my user agent to the electron agent the geohack URL does not print for me for the Berlin article.

We currently use the regular page URL (without ?printable=true), which typically returns a cached response from Varnish: https://github.com/wikimedia/restbase/blob/master/v1/pdf.yaml#L63

We can change this, but it will slow things down slightly.

PR merged. To be deployed early next week.

mobrovac edited projects, added Services (done); removed Services.Feb 9 2017, 8:53 PM

@Tobi_WMDE_SW we can probably close this one now?

Tobi_WMDE_SW closed this task as Resolved.Feb 10 2017, 11:01 AM
Tobi_WMDE_SW claimed this task.
Tobi_WMDE_SW moved this task from Watching/Blocked/External to Demoed on the WMDE-QWERTY-Team board.
Tobi_WMDE_SW moved this task from Demoed to Done on the WMDE-QWERTY-Team board.
Tobi_WMDE_SW moved this task from Done to Demoed on the WMDE-QWERTY-Team board.Feb 14 2017, 3:36 PM