Page MenuHomePhabricator

Cache not deleted on Special:DownloadAsPdf (current version of page not generated after generating old version of page)
Closed, DeclinedPublicBUG REPORT

Description

Alternate long title: Cannot generate the current version of a page as PDF after generating the older version of the page using Download As PDF.

Steps to replicate the issue (include links if applicable):

image.png (154×620 px, 50 KB)

What happens?:

The old version of the Pdf is cached in https://commons.wikimedia.org/api/rest_v1/page/pdf/Commons%3ASimple_media_reuse_guide%2Fid, and I don't know how long it would last. It could be hours, it could be years. There's no indication nor acknowledgement that the Pdf result may be from oldrevision, and users could be downloading the wrong version (different from what they saw on the page).

Worst case scenario, a vandal could insert vandalism, and generate the Pdf (first time; nobody else ever generate Pdf for that article), which contain his vandalism and could stay forever in Pdf form.

Users don't have control on how long the old version would be cached, and cannot purge the cached PDF to generate the newest version of the Pdf

What should have happened instead?:

There's a button to clear cache of the PDF

Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Probably related to T46186, a ten year old bug.

Event Timeline

Bennylin updated the task description. (Show Details)
Aklapper renamed this task from Cannot delete cache on Special:DownloadAsPdf to Cache not deleted on Special:DownloadAsPdf (current version of page not generated after generating old version of page).Feb 28 2023, 8:35 PM
Aklapper added a project: Proton.

I tried downloading again today, and has been able to download the latest version. So I believe the cache is there for a couple hours to one or two days.

I have no problem with this ticket closed. But I think there's still a case for making a "delete/purge cache" button for the special page, just like how "&action=purge" forces a page to be re-cached.

The output of the PDF renderer is cached only for 10 minutes to prevent mass rendering of the same article/regenerating the same content. PDFs are generated by a standalone service called Proton, and the implementation from the beginning didn't listen to purge events.

@Bennylin do you think that we still require a way to purge the cache, even if TTL of it is so low? This is doable but I'm not sure if this is worth the extra complexity on the infrastructure layer.

After what you've explained, I don't think it is needed. Thanks!

Thank you for your cooperation in this matter. Ticket closed.