Page MenuHomePhabricator

Re-evaluate caching and purging of language variants (e.g. "/zh-hans/Page_name")
Open, LowPublic

Description

This is follow-up from T250261 and T250205. As part of the unit testing I added in https://gerrit.wikimedia.org/r/589661 I noticed that the purges MediaWiki currently sends for each language variants are completely broken:

For example:

krinkle@mwmaint1002$ mwscript eval.php --wiki zhwiki
var_dump( Title::newMainPage()->getCdnUrls() );
array(
 "https://zh.wikipedia.org/w/index.php?title=Wikipedia:…&zh"
 "https://zh.wikipedia.org/w/index.php?title=Wikipedia:…&zh-hans"
 …
)

Notice the danging language code as a value-less query parameter. That's not something MediaWiki ever uses and also doesn't actually work. These are as meaningless as &rand123 would be, do not actually contain a variant of the page, and most importantly are not the canonical url for that even if it did work - which means it isn't purging the URLs that users would actually be on if they use a particular language variant.

The correct urls, as configured by $wgVariantArticlePath is e.g. /$2/$1, or https://zh.wikipedia.org/zh-hans/Wikipedia:首页 as example.

This brings me to my next point:

$ curl -i 'https://zh.wikipedia.org/zh-sg/Wikipedia:%E9%A6%96%E9%A1%B5' | head -n30
content-language: zh
vary: Accept-Encoding,Cookie,Authorization
expires: Fri, 17 Apr 2020 17:50:36 GMT
cache-control: private, must-revalidate, max-age=0
age: 0
x-cache: cp3064 miss, cp3054 pass
x-cache-status: pass

... page views with a language variant don't have public/CDN caching enabled right now. At least that means the broken purge logic isn't causing stale content.

But it also suggests that the code responsible for caching and the code responsible for purging aren't aware of each other. I suspect the purge code broke at some point, and then a decade later someone noticed there is no purging for it so better disable caching?

For this task:

  • Remove the dead code.
  • Decide whether to cache and whether to do it different from normal /wiki/Pagename urls.
  • Decide whether to purge.
  • Update Cache-Control logic (where?) to enable caching , if decided so.
  • Update HtmlCacheUpdater logic to restore language variant logic, if decided so (and this time make it actually work).

Event Timeline

Title.php (master)
if ( $this->getPageLanguageConverter()->hasVariants() ) {
    $variants = $this->getPageLanguageConverter()->getVariants();
    foreach ( $variants as $vCode ) {
        $urls[] = $this->getInternalURL( $vCode );
    }
}

This logic was broken in 2016 by commit f684d17b0e66c3, which changed the call from $this->getInternalURL( '', $vCode ); to $this->getInternalURL( $vCode ); because the query2 parameters of getInternalURL and fixUrlQueryArgs were deprecated. But, that also made it no longer use the the legacy behaviour of $query2 being a single-string shortcut for ['variant' => $query2'].

This would be an easy fix, but as mentioned above, these URLs don't need warrant purging right now as they aren't being cached.

After this is removed and after T250261 is fixed, we can think about what we want to do here, and then deal with the fun of injecting LanguageConverter into HtmlCacheUpdater.

Wow. Added in 45a3b669cc7 on 2006-10-12(!).

Change 589665 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] title: Remove broken handling of language variant in getCdnUrls()

https://gerrit.wikimedia.org/r/589665

Change 589665 merged by jenkins-bot:
[mediawiki/core@master] title: Remove broken handling of language variant in getCdnUrls()

https://gerrit.wikimedia.org/r/589665