Page MenuHomePhabricator

Write maintenance script to purge specific files and their old versions
Closed, ResolvedPublic

Description

purgeList.php looks very outdated and purgeChangedFiles only picks up lists from specific tables.

Event Timeline

Gilles created this task.Jun 29 2017, 10:31 AM
TheDJ added a subscriber: TheDJ.Jun 29 2017, 11:48 AM

Be careful to exempt audio/video files etc by default. They have some serious potential performance effects when purged, and are treated a bit more carefully in generally.

Change 362941 had a related patch set uploaded (by Gilles; owner: Gilles):
[mediawiki/core@master] Maintenance script to purge specific page

https://gerrit.wikimedia.org/r/362941

Krinkle closed this task as Resolved.Jul 13 2017, 2:08 AM
Krinkle added a subscriber: Krinkle.

Be careful to exempt audio/video files etc by default. They have some serious potential performance effects when purged, and are treated a bit more carefully in generally.

I believe stored transcodes are already excluded by default from action=purge. This script uses the same WikiPage::doPurge() method.

Change 362941 merged by jenkins-bot:
[mediawiki/core@master] Maintenance script to purge specific page

https://gerrit.wikimedia.org/r/362941

purgeList.php looks very outdated

Have you considered improving it instead of adding just another purge maintenance script? It's hard to tell the difference between both, and the lack of documentation doesn't help. It's confusing to have 2 scripts to purge pages.

purgeList.php looks very outdated

Have you considered improving it instead of adding just another purge maintenance script? It's hard to tell the difference between both, and the lack of documentation doesn't help. It's confusing to have 2 scripts to purge pages.

I brought this up in code review. In the end, I believe the difference is justified, however naming and documentation could definitely be improved.

Difference:

  • purgeList.php: This is specifically for purging external proxies only (e.g. Varnish, Squid). It is primarily URL-based, but also supports a title or namespace, in which case it will make urls for all those pages and purge those.
    • This is a relatively soft purge and has no impact on the MediaWiki application.
    • This script can also be used for urls that do not relate to a wiki page. For example, one single thumbnail url, a /w/load.php url, a /w/api.php url, a static file from a skin or interface such as /w/resources/assets/poweredby_mediawiki_88x31.png or /w/skins/Vector/images/user-icon.png.
  • purgePage.php: This is the equivalent to action=purge and does a lot. For example:
    • It re-parses the wikitext, it saves changes to the database (e.g. current expansion of magic words and templates, writes links to the database for images, links-here, categories etc.).
    • It also performs a database write for page.page_touched to the current time, which has potential cascading effects to other areas of the application.
    • It also purges the page's canonical urls from File cache (wgUseFileCache) and/or from external Squid/Varnish proxy. (This is the part that purgeList.php does)
    • In addition, individual page types (such as FilePage) and extensions, may register additional actions. For example, when purging a File page, we also delete thumbnails from Swift storage, and purge the urls of all thumbnail sizes and variations (page1, page2, 120px, 320px etc. etc.)