Page MenuHomePhabricator

Disable caching on the main page for anonymous users
Closed, DeclinedPublic

Description

Anonymous users cannot view the latest version of the main page unless they purge the cache or have their browsers do it for them. Is there any chance the caching mechanism can be altered?

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Ijon raised the priority of this task from High to Unbreak Now!.Feb 12 2017, 7:44 PM

I have changed this to Unbreak Now! -- it's not okay that our readers are getting stale versions of the main page, on any wiki. It should have been handled long ago.

bd808 subscribed.

Is the problem primarily that the Main Page uses [[{{LOCALDAY}}. {{LOCALMONTHNAME}}]] [[{{LOCALYEAR}}]]? These are all magic words that are fundamentally incompatible with content caching. If they were changed daily using a bot instead MediaWiki would send out the proper purge events.

Thanks, @bd808! That's helpful insight. I guess the Estonian main page needs to be fixed not to use those.

trwiki is using {{#time:Y-m-d}} which again is incompatible with any type of caching.

I would suggest, lowering this from UBN! back to High and changing the task summery to make the goal being educating various wiki communities about parse time magic words that should not be used to provide content or link targets.

trwiki is using {{#time:Y-m-d}} which again is incompatible with any type of caching.

Huh? We don't really allow features that are "incompatible with any type of caching." It sounds like the opposite is happening here, no? If we're showing stale content, then caching is obviously taking place.

I believe the day magic words/parser functions have some special logic that reduces the parser cache time.

I believe the day magic words/parser functions have some special logic that reduces the parser cache time.

Example: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/MagicWord.php;dffa61be3e9d245871d04980ed584cfbdaef05e3$178.

There are many cache layers. It sounds like we're possibly not doing a great job purging/invalidating some HTML cache layer?

Is the problem primarily that the Main Page uses [[{{LOCALDAY}}. {{LOCALMONTHNAME}}]] [[{{LOCALYEAR}}]]? These are all magic words that are fundamentally incompatible with content caching. If they were changed daily using a bot instead MediaWiki would send out the proper purge events.

Let us take Spanish wiki for example: https://es.wikipedia.org/wiki/Wikipedia:Portada

I don't see any problem there. It works. I mean I can assume that LOCALYEAR isn't ok and CURRENTYEAR is ok, but is this really the case here?

I believe the day magic words/parser functions have some special logic that reduces the parser cache time.

Example: https://phabricator.wikimedia.org/source/mediawiki/browse/master/includes/MagicWord.php;dffa61be3e9d245871d04980ed584cfbdaef05e3$178.

There are many cache layers. It sounds like we're possibly not doing a great job purging/invalidating some HTML cache layer?

I'm not sure that those ParserCache TTL hints actually make it all the way out OutputPage::sendCacheControl() which sets the headers that Varnish would respond to. I've never tunneled all the way down to the parser function level before to see how this might work in practice. It looks to me like direct calls to OutputPage::setCdnMaxage() and/or OutputPage::lowerCdnMaxage() are needed to change the Cache: s-maxage=... value that OutputPage::output() adds to the response. I'm not seeing anywhere that Parser would trigger those calls.

Even English Wikipedia's main page uses Wikipedia:Today's featured article/{{#time:F j, Y}}, it's a very common trick. Using time-dependent magic words reduces the parser cache TTL (to either 24h or 1h, depending on the magic word). The #time parser function also sets the TTL, but it sets it on the preprocessor frame, which doesn't appear to propagate to the ParserCache. This would suggest that pages only using {{#time:}} but no magic words like {{CURRENT...}}/{{LOCAL...}} wouldn't get the right parser cache TTL set.

This would explain why the trwiki main page is reported to be stale (it only uses {{#time:}}, nothing else) and why enwiki's isn't (it uses {{CURRENTDAYNAME}}), but it doesn't explain why etwiki's is reported to be stale, since it uses {{LOCALDAY}}.

I went and viewed a few of these in incognito (around 10:50 UTC on Feb 13):

trwiki:

NewPP limit report
Parsed by mw1174
Cached time: 20170209213001
Cache expiry: 3600
Dynamic content: true

etwiki:

NewPP limit report
Parsed by mw1274
Cached time: 20170212201347
Cache expiry: 3600
Dynamic content: true

enwiki:

NewPP limit report
Parsed by mw1199
Cached time: 20170213104450
Cache expiry: 3600

eswiki:

NewPP limit report
Parsed by mw1182
Cached time: 20170213094253
Cache expiry: 3600
Dynamic content: true

enwiki looks fine, eswiki looks a little bit weird (just over an hour old while the cache expiry claims to be an hour), but etwiki is showing me a 14-hour old page and trwiki's is almost 4 days old. Maybe cache turnover or other strange effects are making this work for large wikis but not small ones?

(Also, trwiki's main page expiry is listed as 3600, so that suggests that {{#time:}} does set the TTL correctly, or else maybe a template used on that page uses a time-based magic word.)

Some comments:

  • This seems like it may be a duplicate of T51803: Calculated Age of persons can become outdated until next cache purge.
  • {{#time:}}'s TTL doesn't seem to be being propagated, so that 3600 must be coming from somewhere else. In trwiki, it seems to be coming from Vikipedi:Anasayfa yeni başlık.
  • The documentation on PPFrame::getTTL() specifically states that it isn't propagated to the parser cache expiry. There's no indication in I412febf3 as to why not. Note if we do do that we may want to enforce some minimum, since {{#time:}} will correctly indicate a 1-second expiry if it's outputting the seconds.
  • It doesn't look like the parser cache expiry is propagated to the OutputPage expiry either.
  • A patch from 2013 to make #time set the parser cache TTL to 12 hours was eventually abandoned, with statements that magic words like {{CURRENTDAY}} should have their TTLs raised.
greg lowered the priority of this task from Unbreak Now! to High.EditedFeb 13 2017, 6:24 PM
greg subscribed.

This is not an emergency, lowering to High.

Agree re lowering to High. But it does seem like a bug and not etwiki's using inappropriate magic words.

#time and co. are used on many pages and usually they do not require cache invalidation. For example {{update after}} compares its arguments to the current date to decide whether the page should be in some "needs update" category, which is absolutely no reason to limit HTML cache expiry to one day.

Maybe we should prevent CURRENT* from changing parser cache expiry as well, and introduce a dedicated magic word which does that instead (and correctly propagates to Varnish) so that it is easier to differentiate between maintenance-related date logic and actual volatile content.

#time and co. are used on many pages and usually they do not require cache invalidation. For example {{update after}} compares its arguments to the current date to decide whether the page should be in some "needs update" category, which is absolutely no reason to limit HTML cache expiry to one day.

Well, ideally it would limit cache expiry to "however much time is left until the comparison changes state". People do want the tag and the maintenance category to start showing up as close to the target date as possible. Lacking that knowledge of how its output is going to be used, though, {{CURRENTDAY}} limiting expiry to "tomorrow" isn't at all unreasonable from a behavior perspective (even though it probably sucks from a performance perspective).

Well, ideally it would limit cache expiry to "however much time is left until the comparison changes state".

Parser cache expiry, yes. Browser/varnish cache expiry, no. Logged-in users are already uncached and anonymous users don't care about maintenance categories (which are probably hidden anyway).

I removed the date info from the main page of Estonian Wikipedia, but it only helps to hide the issue and not to solve it (the weekly changing content is still affected). And the same problem applies to MediaWiki:Sitenotice and other announcements. For not-logged-in users they are sometimes visible and sometimes not and I've heard from some people, that on few occasions there are significant delays (some notices have been presented month later, when they aren't even relevant anymore).

What should I do to fix the problem? Excuse for troubling!

For me, it seems that the issue has grown even bigger in time. The delay with Estonian Wikipedia is often like 3 weeks (!!!), that means not-logged-in-users hardly ever see up-to-the-date info when they visit the main page. Could it please be fixed somehow?

For me, it seems that the issue has grown even bigger in time. The delay with Estonian Wikipedia is often like 3 weeks (!!!), that means not-logged-in-users hardly ever see up-to-the-date info when they visit the main page. Could it please be fixed somehow?

Are you saying content (other than {{#time or {{LOCALDAY etc) is not updating?

Fwiw: im of the opinion that date magic words should reduce varnish cache to at least 24 hours, maybe six hours. Im doubtful that super long cache times for all pages in varnish are really that worth it...

Fwiw: im of the opinion that date magic words should reduce varnish cache to at least 24 hours, maybe six hours.

The Varnish caches already self-limit to 24 hours per layer, aside from honoring MW's CC/Max-age claims, but there are some deeper edge-case issues here:

  1. It's possible, especially for something like a hot Main_page, for the 24 hours to stretch to 48-72 hours due to imperfect refresh timing between the up 2-3 layers (depending on the edge).
  2. But, this is limited by the CC:maxage/Age values sent by MW (which are currently over-long)
  3. But also, Varnishes can potentially keep very hot objects alive much longer if MW lies about 304 Not Modified, see the long threads at T124954#2399694 . Things have moved on since even that thread was last updated, and we're probably overdue for turning MW's $SquidMaxAge down even further from its current value, but that's a bit broader in scope than this ticket. It's probably still 14 days, and we could test stepping it down in stages to 7 days and then even ~3 days at this point, before we have to re-consider edge cache issues too hard (about how long our "keep" timers are).

Im doubtful that super long cache times for all pages in varnish are really that worth it...

They're not, for hitrate, but they are important for other operational concerns about taking servers/DCs in and out of traffic flow without major disruptions. For that reason, we really try to avoid hot content having <1d TTLs for now.

For me, it seems that the issue has grown even bigger in time. The delay with Estonian Wikipedia is often like 3 weeks (!!!), that means not-logged-in-users hardly ever see up-to-the-date info when they visit the main page. Could it please be fixed somehow?

Are you saying content (other than {{#time or {{LOCALDAY etc) is not updating?

Yes, that had always been the case. Just that date thing was very easily detectable and people often asked: "why is that thing wrong in the front page". (wikipedians on the other hand never notice that, as we are constantly logged-in) As we don't change the front page often (as content comes via sub-pages, that themselves are asked via HETKENÄDAL (CURRENTWEEK) magic word), then the system seems to treat that hight traffic page as something that needs cache updating only once a month.

Any progress on that?

Just a proposal: might it be possible to give project front pages some special status or something for that cache updater so that their cache is always updated at least once a day. That should fix the current issue for most cases anyway.

Aklapper lowered the priority of this task from High to Medium.May 24 2019, 9:06 AM

Lowering priority to reflect the unfortunate reality.

  • Could something like mw:Manual:PurgeList.php offer a possible solution here? Maybe combined with some MediaWiki:Purge-list message that can be edited by the project's admins, containing a list like [[Main page]], [[Portal:...]], ... (the pages to be purged on a 24-hour interval; a fixed time would be better).

or:

  • Should we just let a maintenance bot do a daily purge action? (This may be easier to execute at the same time every day, also taking DST into account.)

Would such purge solutions work well, and are there any downsides?

MediaWiki requires a mininum cache time for all pages, and this includes and especially applies to the Main Page. I do not think an exception should be made for that.

However there is no need for such exception, as the Parser is quite capable of tuning the cache time as-needed and generally does so.

Each of the parser functions and magic words that deal with time, such as {{#time:}} and {{CURRENTDAY}} have code that shortens the expiry from the default 30 days to something that will ensure they are updated in due time, generally within a one or two lower-order intervals. Meaning that {{currentmonth}} is cached for 24 hours and {{localday}} is cache for 1 hour.

This logic already exists and is deployed and enabled in production for Wikipedia, has existed for many years. We do not need major decisions to be made or new features to be developed, for this is already supported and working on most wikis.

This task represents a yet-to-be-determined edge case causing this to sometimes not work correctly. When this tasjk is prioritised, the bug will be understood and fixed accordingly.

It would help if together we can determine which general area is at fault:

  • Parser cache: This is the internal cache for the "content" part of a page, and is used by both logged-in and logged-out users (assuming you have the default skin and interface language etc).
    • When you "purge" an article, this is cleared. When you clear your browser cache or use a private browsing mode, you still get the same parser cache. When you add a ?123 query string to a URL, you are still using the same parser cache.
  • HTTP cache: This is the operations traffic layer (sometimes known as "Varnish and ATS").
    • When you "purge" an article, this is also cleared. When you clear your browser cache or use a private browsing mode, you still get the same HTTP cache. When you add a ?123 query string to a URL, you are skipping the HTTP cache.
  • Browser cache: This is on your own device. When you add a ?123 query string to a URL, you are skipping the browser cache. When you clear your browser cache or use a private browsing mode, you skip the browser cache.

If we find that the parser cache is at fault, then it would likely mean that it is consistently a problem for every version of every page that uses these wikitext features, making it easy to reproduce. I believe that based on the rare reports, this is likely not the case. Also because this logic exists and seems to work when tested ad-hoc.

If we find that the HTTP cache is at fault, this could mean that the parser cache has the correct expiry but that the HTTP cache is not copying the expiry correctly. Or perhaps it is (sometimes) forgetting to clear it after it expires, or renewing it in a way that is not valid.

If we find that it only affects the browser cache, then it is possible the issue is not related with our caching and not specific to any magic words, but rather that there is a bug in how we negiotiate with the user's browser on whether to download a new copy or not.

Krinkle changed the task status from Open to Stalled.Sep 24 2020, 7:54 PM

Pending feedback or confirmation from trwiki editors.

There is a stable difference between logged in and unlogged sessions. See https://ba.wikipedia.org/wiki/Баш_бит :

  1. logged in

image.png (236×651 px, 23 KB)

  1. unlogged

image.png (152×399 px, 13 KB)

This behavior in some small language editions of Wikipedia (ba, tyv, ce) , but not in largest sites, that as ru-wiki etc.

The swap of Traffic for Traffic-Icebox in this ticket's set of tags was based on a bulk action for all tickets that aren't are neither part of our current planned work nor clearly a recent, higher-priority emergent issue. This is simply one step in a larger task cleanup effort. Further triage of these tickets (and especially, organizing future potential project ideas from them into a new medium) will occur afterwards! For more detail, have a look at the extended explanation on the main page of Traffic-Icebox . Thank you!

Seksen changed the task status from Stalled to Open.EditedDec 12 2022, 12:21 AM
Seksen raised the priority of this task from Medium to High.
Seksen subscribed.

Trwiki admin here, admittedly with little-to-no technical knowledge regarding this, just dropping by to note that this is still very much an issue. I'm changing the status accordingly. We've had to set up a bot to edit the main page to purge the cache for the last two years (it has to be manually triggered when the bot fails to edit), so that's why no one's come chasing after this task, but a patch-up is still just that, a patch-up. It should be an interim solution only. We've seen the downsides of this approach in the last few days when the bot failed to purge the cache...

This sort of bug wouldn't be tolerated on bigger wikis. Our editors put a lot of effort into maintaining our main page, day-in and day-out, so it's a bit disheartening for them to continually run into this problem (and it does tell a thing or two about the systemic imbalances within the movement...). It's now been seven years, and this does need to be sorted. Raising the priority accordingly.

As per Krinkle's question above - I've unfortunately edited the mainpage today before this comment so unable to answer today. Will try to check back.

Aklapper lowered the priority of this task from High to Medium.Dec 12 2022, 4:33 AM

So based purely on Krinkle's comment above it does appear that this would likely be a problem with the parser cache.

Adding ?123 query string into the URL does not resolve the problem, nor does clearing the browser cache/using a private browsing mode. The caveat of course is that logging in resolves the problem.

@Seksen When browsing with a login session, you do still enjoy the performance benefit of the ParserCache, this is applied to all backend traffic regardless. We don't re-parse pages on every page view when you're logged-in. The "only" cache that is bypassed, and quite a large cache, is the HTTP cache (sometimes known as "CDN cache" or "URL cache"), which indeed is bypassed for logged-in users to personalise the page (for preferences, notifications, etc), and can be bypassed with a unique query string the first time you use it on a given day, such as ?123 indeed.

If the issue presents itself when logged-out with ?123, but not when logged-in; that would seem impossible at glance. I can imagine a few edge cases that might cause that.

One edge case is paging under FlaggedRevs, then your account likely sees the "latest" revision instead of the "stable" revision. However, this seems unlikely given reports that "purging" the page immediately solves the issue for logged-out users (and purging has no effect on which revision has been marked as reviewed).

Another edge case might be that your account happens to have a preference set that makes you different from most users (including most logged-out and most logged-in users) that gives you a different parser cache. For example, if you have a non-default thumbnail size or non-default userinterface language. In that case the page may appear different to you, and thus not share the same cache as other users. If you want to rule this out, you could go to Special:Preferences and reset your preferences (possibly via a sock puppet account, or carefully note down what your preferences are to not lose them).

Actually, there is an easier way to determine whether account preferences play a rule - you can view-source and take note of the parser cache key, which identifies the parsercache entry that was used to render this page.

For me, at view-source:https://tr.wikipedia.org/wiki/Anasayfa there is toward the bottom end of the code, the following comment:

Parsed by mw1374
Cached time: 2022-12-14 00:12:15 UTC
Cache expiry: 3600
Reduced expiry: true
…
<!-- Saved in parser cache with key trwiki:pcache:idhash:2740662-0!canonical and … revision id 28930102. -->

Could you take a look at this when the issue is happening, and take note of both the values where you see the issue and where you don't see the issue (e.g. both logged-in and logged-out around the same time).

@Krinkle So this is today at around 14:30 UTC. When logged in (with the admin account) and not experiencing the issue, the values are as below:

"cachereport":{"origin":"mw1370","timestamp":"20221218140222","ttl":3600,"transientcontent":true}}});});</script>

Parsed by mw1370
Cached time: 20221218140222
Cache expiry: 3600
Reduced expiry: true

When not logged in, the cache issue came back - oddly there was no such comment but I noted the following bit of the code:

"cachereport":{"origin":"mw1401","timestamp":"20221217055852","ttl":3600,"transientcontent":true}}});});</script>

When I log on with my alternative account with no specialised settings, once again I experienced no issue. One of our admins edited the mainpage shortly afterwards so I could not get values for this. That being said, these are the values when NOT logged in and after the said edit:

"cachereport":{"origin":"mw1456","timestamp":"20221218144218","ttl":3600,"transientcontent":true}}});});</script>

Other samples;

From a non-admin account at 15 December around 06.00 UTC;
without login :
"cachereport":{"origin":"mw1353","timestamp":"20221214001230","ttl":3600,"transientcontent":true}
("Parsed by..." part doesn't exist)

with login:
"cachereport":{"origin":"mw1419","timestamp":"20221215052928","ttl":3600,"transientcontent":true}
Parsed by mw1419
Cached time: 20221215052928
Cache expiry: 3600
Reduced expiry: true
<!-- Saved in parser cache with key trwiki:pcache:idhash:2740662-0!canonical and timestamp 20221215052928 and revision id 28930102.
-->


From an admin account at 20 December around 08.00 UTC;
without login :
"cachereport":{"origin":"mw1369","timestamp":"20221219092603","ttl":3600,"transientcontent":true}
("Parsed by..." part doesn't exist)

with login:
"cachereport":{"origin":"mw1432","timestamp":"20221220072442","ttl":3600,"transientcontent":true}
NewPP limit report
Parsed by mw1432
Cached time: 20221220072442
Cache expiry: 3600
Reduced expiry: true
<!-- Saved in parser cache with key trwiki:pcache:idhash:2740662-0!thumbsize=2 and timestamp 20221220072441 and revision id 28958225.
-->

The situation is the same for both users as stated in previous comment.

It's now been a month since the last messages - any updates on this? As this is a problem that the entire trwiki community seems to be experiencing, surely the devs would be able to reproduce this on their own devices?

It shouldn't require me to say that expecting a mid-sized community to manually purge their main page every single day is not an acceptable solution to this. Like, I can't see projects like enwiki or dewiki ever being expected to live with this sort of problem. Do we really need to make a fuss about this as a community for this to be taken seriously?

So everything reported on this bug is consistent with how caching is expected to work (except maybe the part where it was reported that adding ?123 to the end of the url when logged out didn't help things. I would be curious what the cached report says for that specific case). Thus what you're experiencing is the same as what every wiki is experiencing, including enwiki, although they may use bots to work around it.

It shouldn't require me to say that expecting a mid-sized community to manually purge their main page every single day is not an acceptable solution to this. Like, I can't see projects like enwiki or dewiki ever being expected to live with this sort of problem. Do we really need to make a fuss about this as a community for this to be taken seriously?

The English Wikipedia doesn't rely on solely time-based parser functions for cache invalidation, the "Did you know" section is updated by a bot at midnight and the "In the news" section tends to be updated more frequently, both triggering cache invalidation.

Previously a bot was used to purge the main page on a regular interval. That would be my recommendation here, just set up a bot to automatically purge the main page however frequently you need it (every hour, every day, whatever). I am happy to help a trwiki bot operator set something up.

Thanks @Legoktm, I think we may try the null-bot approach.

@Legoktm could you help me with this at euwiki? Thanks!