Disable caching on the main page for anonymous users
Open, HighPublic

Description

Anonymous users cannot view the latest version of the main page unless they purge the cache or have their browsers do it for them. Is there any chance the caching mechanism can be altered?

Superyetkin updated the task description. (Show Details)
Superyetkin raised the priority of this task from to High.
Superyetkin added a subscriber: Superyetkin.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 23 2015, 9:37 AM

Today, on November 28th, for anonymous users the November 10th version of Estonian Wikipedia main page is shown.

Ijon raised the priority of this task from High to Unbreak Now!.Feb 12 2017, 7:44 PM
Ijon added a subscriber: Ijon.
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald TranscriptFeb 12 2017, 7:44 PM
Ijon added a comment.Feb 12 2017, 7:45 PM

I have changed this to Unbreak Now! -- it's not okay that our readers are getting stale versions of the main page, on any wiki. It should have been handled long ago.

bd808 edited projects, added Traffic; removed MediaWiki-Cache.Feb 12 2017, 10:27 PM
bd808 added a subscriber: bd808.

Is the problem primarily that the Main Page uses [[{{LOCALDAY}}. {{LOCALMONTHNAME}}]] [[{{LOCALYEAR}}]]? These are all magic words that are fundamentally incompatible with content caching. If they were changed daily using a bot instead MediaWiki would send out the proper purge events.

Ijon added a comment.Feb 12 2017, 10:32 PM

Thanks, @bd808! That's helpful insight. I guess the Estonian main page needs to be fixed not to use those.

trwiki is using {{#time:Y-m-d}} which again is incompatible with any type of caching.

bd808 added a comment.EditedFeb 12 2017, 10:35 PM

I would suggest, lowering this from UBN! back to High and changing the task summery to make the goal being educating various wiki communities about parse time magic words that should not be used to provide content or link targets.

trwiki is using {{#time:Y-m-d}} which again is incompatible with any type of caching.

Huh? We don't really allow features that are "incompatible with any type of caching." It sounds like the opposite is happening here, no? If we're showing stale content, then caching is obviously taking place.

I believe the day magic words/parser functions have some special logic that reduces the parser cache time.

I believe the day magic words/parser functions have some special logic that reduces the parser cache time.

Example: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/MagicWord.php;dffa61be3e9d245871d04980ed584cfbdaef05e3$178.

There are many cache layers. It sounds like we're possibly not doing a great job purging/invalidating some HTML cache layer?

Is the problem primarily that the Main Page uses [[{{LOCALDAY}}. {{LOCALMONTHNAME}}]] [[{{LOCALYEAR}}]]? These are all magic words that are fundamentally incompatible with content caching. If they were changed daily using a bot instead MediaWiki would send out the proper purge events.

Let us take Spanish wiki for example: https://es.wikipedia.org/wiki/Wikipedia:Portada

I don't see any problem there. It works. I mean I can assume that LOCALYEAR isn't ok and CURRENTYEAR is ok, but is this really the case here?

I believe the day magic words/parser functions have some special logic that reduces the parser cache time.

Example: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/MagicWord.php;dffa61be3e9d245871d04980ed584cfbdaef05e3$178.

There are many cache layers. It sounds like we're possibly not doing a great job purging/invalidating some HTML cache layer?

I'm not sure that those ParserCache TTL hints actually make it all the way out OutputPage::sendCacheControl() which sets the headers that Varnish would respond to. I've never tunneled all the way down to the parser function level before to see how this might work in practice. It looks to me like direct calls to OutputPage::setCdnMaxage() and/or OutputPage::lowerCdnMaxage() are needed to change the Cache: s-maxage=... value that OutputPage::output() adds to the response. I'm not seeing anywhere that Parser would trigger those calls.

Even English Wikipedia's main page uses Wikipedia:Today's featured article/{{#time:F j, Y}}, it's a very common trick. Using time-dependent magic words reduces the parser cache TTL (to either 24h or 1h, depending on the magic word). The #time parser function also sets the TTL, but it sets it on the preprocessor frame, which doesn't appear to propagate to the ParserCache. This would suggest that pages only using {{#time:}} but no magic words like {{CURRENT...}}/{{LOCAL...}} wouldn't get the right parser cache TTL set.

This would explain why the trwiki main page is reported to be stale (it only uses {{#time:}}, nothing else) and why enwiki's isn't (it uses {{CURRENTDAYNAME}}), but it doesn't explain why etwiki's is reported to be stale, since it uses {{LOCALDAY}}.

I went and viewed a few of these in incognito (around 10:50 UTC on Feb 13):

trwiki:

NewPP limit report
Parsed by mw1174
Cached time: 20170209213001
Cache expiry: 3600
Dynamic content: true

etwiki:

NewPP limit report
Parsed by mw1274
Cached time: 20170212201347
Cache expiry: 3600
Dynamic content: true

enwiki:

NewPP limit report
Parsed by mw1199
Cached time: 20170213104450
Cache expiry: 3600

eswiki:

NewPP limit report
Parsed by mw1182
Cached time: 20170213094253
Cache expiry: 3600
Dynamic content: true

enwiki looks fine, eswiki looks a little bit weird (just over an hour old while the cache expiry claims to be an hour), but etwiki is showing me a 14-hour old page and trwiki's is almost 4 days old. Maybe cache turnover or other strange effects are making this work for large wikis but not small ones?

(Also, trwiki's main page expiry is listed as 3600, so that suggests that {{#time:}} does set the TTL correctly, or else maybe a template used on that page uses a time-based magic word.)

Anomie added a subscriber: Anomie.Feb 13 2017, 2:39 PM

Some comments:

  • This seems like it may be a duplicate of T51803: Calculated Age of persons can become outdated until next cache purge.
  • {{#time:}}'s TTL doesn't seem to be being propagated, so that 3600 must be coming from somewhere else. In trwiki, it seems to be coming from Vikipedi:Anasayfa yeni başlık.
  • The documentation on PPFrame::getTTL() specifically states that it isn't propagated to the parser cache expiry. There's no indication in I412febf3 as to why not. Note if we do do that we may want to enforce some minimum, since {{#time:}} will correctly indicate a 1-second expiry if it's outputting the seconds.
  • It doesn't look like the parser cache expiry is propagated to the OutputPage expiry either.
  • A patch from 2013 to make #time set the parser cache TTL to 12 hours was eventually abandoned, with statements that magic words like {{CURRENTDAY}} should have their TTLs raised.
jrbs added a subscriber: jrbs.Feb 13 2017, 5:51 PM
greg lowered the priority of this task from Unbreak Now! to High.EditedFeb 13 2017, 6:24 PM
greg added a subscriber: greg.

This is not an emergency, lowering to High.

Ijon added a comment.Feb 13 2017, 9:48 PM

Agree re lowering to High. But it does seem like a bug and not etwiki's using inappropriate magic words.

Tgr added a subscriber: Tgr.Feb 13 2017, 10:08 PM

#time and co. are used on many pages and usually they do not require cache invalidation. For example {{update after}} compares its arguments to the current date to decide whether the page should be in some "needs update" category, which is absolutely no reason to limit HTML cache expiry to one day.

Maybe we should prevent CURRENT* from changing parser cache expiry as well, and introduce a dedicated magic word which does that instead (and correctly propagates to Varnish) so that it is easier to differentiate between maintenance-related date logic and actual volatile content.

#time and co. are used on many pages and usually they do not require cache invalidation. For example {{update after}} compares its arguments to the current date to decide whether the page should be in some "needs update" category, which is absolutely no reason to limit HTML cache expiry to one day.

Well, ideally it would limit cache expiry to "however much time is left until the comparison changes state". People do want the tag and the maintenance category to start showing up as close to the target date as possible. Lacking that knowledge of how its output is going to be used, though, {{CURRENTDAY}} limiting expiry to "tomorrow" isn't at all unreasonable from a behavior perspective (even though it probably sucks from a performance perspective).

Tgr added a comment.Feb 14 2017, 5:44 PM

Well, ideally it would limit cache expiry to "however much time is left until the comparison changes state".

Parser cache expiry, yes. Browser/varnish cache expiry, no. Logged-in users are already uncached and anonymous users don't care about maintenance categories (which are probably hidden anyway).

I removed the date info from the main page of Estonian Wikipedia, but it only helps to hide the issue and not to solve it (the weekly changing content is still affected). And the same problem applies to MediaWiki:Sitenotice and other announcements. For not-logged-in users they are sometimes visible and sometimes not and I've heard from some people, that on few occasions there are significant delays (some notices have been presented month later, when they aren't even relevant anymore).

ema moved this task from Triage to Caching on the Traffic board.Mar 2 2017, 10:05 AM
This comment was removed by StevenJ81.
Umar added a comment.Apr 1 2018, 7:55 PM

What should I do to fix the problem? Excuse for troubling!