Time prior to removal of old wmfbranch directories from cluster MUST be higher than longest cache of ANY kind; leads to missing resources
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Krinkle
	Feb 1 2013, 5:04 AM

Description

I just noticed when browsing some of our staff's user pages on wikimediafoundation.org that the search icon on the top left is not appearing for several user pages. This was because the we removed the wmfbranch directories on the server before the last cache expired.

For example the search-magnify icon is referred to in the html via the static-{wmfbranch} path on the "bits" server.

via https://bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4

Reproduce

log out
clear cookies just logging out apparently still leaves 3 cookies which cause some part of the cluster to serve // a new version instead >>> bug?
page last modified before December 1, 2012
current date after January 30, 2013
visit one of:
- https://wikimediafoundation.org/wiki/User:Gyoung
- https://wikimediafoundation.org/wiki/User:Catrope

Request

Request URL: https://wikimediafoundation.org/wiki/User:Gyoung
Request Method: GET
Status Code: 200 OK

Request Headers

GET /wiki/User:Gyoung HTTP/1.1
Host: wikimediafoundation.org
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.101 Safari/537.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3

Response Headers

HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Fri, 01 Feb 2013 04:47:40 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 7118
Connection: keep-alive
X-Content-Type-Options: nosniff
Content-Language: en
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=foundationwikiToken;string-contains=foundationwikiLoggedOut;string-contains=foundationwiki_session;string-contains=mf_useformat
Last-Modified: Wed, 19 Sep 2012 20:02:07 GMT
Content-Encoding: gzip
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, must-revalidate, max-age=0
Vary: Accept-Encoding,Cookie
X-Cache: HIT from sq72.wikimedia.org
X-Cache-Lookup: HIT from sq72.wikimedia.org:3128
X-Cache: MISS from sq64.wikimedia.org
X-Cache-Lookup: HIT from sq64.wikimedia.org:80
Via: 1.1 sq72.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 sq64.wikimedia.org:80 (squid/2.7.STABLE9)

Response

<!DOCTYPE html>
<html>
<meta name="generator" content="MediaWiki 1.21wmf1">
..
<div id="mw-content-text" ..>

..

..

</div>
..
<img src="//bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4" alt="Search" width="12" height="13">
..

..
</html>

Errors

404 (Not Found)
GET https://bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4

404 (Not Found)
GET https://bits.wikimedia.org/static-1.21wmf1/skins/common/images/poweredby_mediawiki_88x31.png

So, to conclude. These paths can be in the database, memcached, squid, varnish, whatever the case. If some component somewhere is not modified (modules, files, wiki pages, configuration, epoch, whatever it is) it may be cached by one of the caches somewhere, which means we must be sure to never remove publicly exposed paths before the longest cache is expired.

Marking as regression as this is a regression from the het deploy process.

We just need to make sure that we don't perform the teardown of an iteration until the longest cache is expired.

This can be documented and hoped that everyone will remember, but though it is only a small image this time, it can cause more significant and visual damages other times. The principle is the same, so let's not find out the hard way but be smart about it.

If I recall correctly there is a maintenance script in multiversion that removes the paths and symlinks (essentially the teardown opposite of bin/checkoutMediaWiki (bin/deleteMediaWiki)[1]). I propose we add some logic there that determines how old a branch is (commit date of first commit in the branch deriving from master) and ensure that it is

older than (current time) - CACHE_MAX_MAX_AGE + CACHE_HERE-BE-DRAGONS_MARGIN

These constants can be hardcoded in the script since there is no realistically feasible way to determine the maximum max age of all caching layers we have. From guess I'd say that max max age is 31 days and margin of 7 days.

If the condition is false, the shell user is NOT allowed to execute the script further.

[1]
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=tree
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=blob;f=checkoutMediaWiki;h=677f17d0121743ed4b94bfc259d4b46255edc0ce;hb=HEAD
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=blob;f=deleteMediaWiki;h=b90bf0c0a7b4687a880d077dcfab360e3add5949;hb=HEAD

Version: unspecified
Severity: major

Details

Reference: bz44570

Related Objects

Mentioned In: T178629: Last-Modified header makes pages uncacheable in frontend caches when it was modified after $wgSquidMaxage
T140921: Reduce static asset time on disk from five trains' worth to two
T124954: Decrease max object TTL in varnishes
T102991: Verify traffic to static resources from past branches does indeed drain
Mentioned Here: T178629: Last-Modified header makes pages uncacheable in frontend caches when it was modified after $wgSquidMaxage

Event Timeline

• bzimport raised the priority of this task from to High.Nov 22 2014, 1:27 AM

• bzimport added a project: WMF-General-or-Unknown.

• bzimport set Reference to bz44570.

Krinkle created this task.Feb 1 2013, 5:04 AM

cc-ing @Reedy (since he runs them most often) and @Catrope, @AaronSchulz because they appear to have written most of the code.

Faidon noticed this earlier this week. ;) I wonder if there was already a bug for it. I seem to recall it being much more frequent in 1.20

Noting wmf1 was out of production in October I seem to recall.

I don't remove any till at least a month after the last used date.

I guess the parser cache is seemingly far too long.

Faidon noticed this earlier this week. ;) I wonder if there was already a bug for it. I seem to recall it being much more frequent in 1.20

Noting wmf1 was out of production in October I seem to recall.

I don't remove any till at least a month after the last used date.

I guess the parser cache is seemingly far too long.

Duplicate of bug 40126?

(In reply to comment #4)

Duplicate of bug 40126?

Yup. Though I think it probably makes more sense to mark 40126 as a dupe of this one...

Bug 40126 has been marked as a duplicate of this bug. ***

'wgParserCacheExpireTime' => array(
'default' => 86400 * 365,
),

Down to 28 days in https://gerrit.wikimedia.org/r/47202

Note, it's not deployed! (for obvious reasons)

Though, it doesn't help us with other old stale parser cache entries laying around.

'wgSquidMaxage' => array(
'default' => 2678400, 31 days seems about right
'foundationwiki' => 3600, template links may be funky
),

I guess it should maybe match Squid Maxage.. Should the parser cache also be lower for foundationwiki? 60 minutes seems a bit on the low side though.

(In reply to comment #8)

Though, it doesn't help us with other old stale parser cache entries laying
around.

maintenance/purgeParserCache.php

Using an age option of the value we have the new expire time to be. Might want to do it in stages...

class misc::maintenance::parsercachepurging {

system_role { "misc::maintenance::parsercachepurging": description => "Misc - Maintenance Server: parser cache purging" }

cron { 'parser_cache_purging':

		user => apache,
		minute => 0,
		hour => 1,
		weekday => 0,
		# Purge entries older than 30d * 86400s/d = 2592000s
		command => '/usr/local/bin/mwscript purgeParserCache.php --wiki=aawiki --age=2592000 >/dev/null 2>&1',
		ensure => present,

}

Those parser cache entries should've been removed at 30 days old...

Bug 42858 has been marked as a duplicate of this bug. ***

Those caches have an expiry time longer than 30 days because it takes longer than 30 days for them to fill. We didn't just pick those numbers out of a hat.

Bug 42858 has been marked as a duplicate of this bug. ***

We're still serving links to 1.21wmf1 and 1.21wmf2.

I propose we:

Short term: Stop deleting branches from now on.
Short term: Re-deploy missing versions between 1.21wmf1 and current

Medium term: Find out a reliable time duration at which no pages are being served anymore linking to an old version.
Medium term: Schedule deletions of old versions from production only after a version is completely "expired" and obsolete.

(In reply to comment #14)

We're still serving links to 1.21wmf1 and 1.21wmf2.

I propose we:

Short term: Stop deleting branches from now on.

Short term: Re-deploy missing versions between 1.21wmf1 and current

Medium term: Find out a reliable time duration at which no pages are being

served anymore linking to an old version.

Medium term: Schedule deletions of old versions from production only after

a
version is completely "expired" and obsolete.

Long term: Modify references to not be hardcoded in the main html output, so that this doesn't matter anymore and it is all handled by ResourceLoader instead. For images that means css, for scripts that means mw.loader.load, and for the csshover file... well, I guess we could maybe set $wgLocalStylePath to the generic /skins/ symlink that points to one of the HET-deployed versions (may not be the right version though). Or perhaps make it so that the docroot /w of each wiki is pointing to the correct php dir). Anyway, that's long term idealism.

'wgParserCacheExpireTime' => array(
'default' => 86400 * 365,
),

Should we keep files around for a year? I don't think so..

Considering we still have space issues on some machines (non apache), just storing numerous times more files is wasting space.

(In reply to comment #15)

Long term: Modify references to not be hardcoded in the main html output,

so that this doesn't matter anymore and it is all handled by ResourceLoader
instead. For images that means css, for scripts that means mw.loader.load,
and for the csshover file... well, I guess we could maybe set $wgLocalStylePath
to the generic /skins/ symlink that points to one of the HET-deployed versions
(may not be the right version though). Or perhaps make it so that the docroot
/w of each wiki is pointing to the correct php dir). Anyway, that's long term
idealism.

This seems like a sensible approach. Why is it long term idealism, though?

(In reply to comment #17)

(In reply to comment #15)

Long term: Modify references to not be hardcoded in the main html output,

so that this doesn't matter anymore and it is all handled by ResourceLoader
instead. For images that means css, for scripts that means mw.loader.load,
and for the csshover file... well, I guess we could maybe set $wgLocalStylePath
to the generic /skins/ symlink that points to one of the HET-deployed versions
(may not be the right version though). Or perhaps make it so that the docroot
/w of each wiki is pointing to the correct php dir). Anyway, that's long term
idealism.

This seems like a sensible approach. Why is it long term idealism, though?

Because we already have a ton of stuff in various layers of cache that we can't just get rid of (so there's the short term first). And certain things can't use mw.loader.load yet because of other issues, and it isn't without controversy to start linking to /skins/ instead of /skins-{version}/, afterall we version these for a reason. Blinding linking to version A from output of wiki on version B can cause all sorts of undocumented trouble.

And there's maybe some layout reason or semantic reason for certain references to be in HTML instead of CSS.

So from what I understand:
*we want to have version numbers in static resources urls
*we want to delete old skins directories (i must say im surprised that space is an issue. I would expect the static assets that are not loaded via load.php (only ui images?) To total about a megabyte.)
*we want to have really long lived parser cache entries that contain refs to static assets and the references shouldnt expire.

Possibly stupid idea - why not have a 404 handler (or rewrite rule, etc) which if given a url with an outdated skins url gives an http redirect to the new url. In most cases showing a newer version of an image is not a bad thing.

Maybe a solution would be to change the way we're dealing with versioning. Instead of creating new directories for each wmfbranch, place all of them in the same (just upgrade over it) and on all links to it (images, scripts, all static content) use a query string that differentiates each version, like for example:

/static/skins/vector/images/search-ltr.png?1.21wmf1

Then the server cache could store it for a very long time, and whenever a new version is made, change the query string so it forces a new entry on the cache. If that query string is problematic (because of the dot, may be interpreted as a file extension) just use the same schema and make all of them point to the same directory as Bawolff suggests.

But then we could have a problem if not all wikis use the same wmfbranch and we need to keep two different versions accessible at the same time in case the server cache is cleared up. I'm not familiar about how Wikimedia is doing the upgrade of all wikis. All upgrade at once? By groups? One by one? Maybe creating two or more groups of static content could deal with that (bits1.wikimedia.org, bits2.wikimedia.org, etc).

(In reply to comment #20)

Maybe a solution would be to change the way we're dealing with versioning.
Instead of creating new directories for each wmfbranch, place all of them in
the same (just upgrade over it) and on all links to it (images, scripts, all
static content) use a query string that differentiates each version, like for
example:

/static/skins/vector/images/search-ltr.png?1.21wmf1

No.

First of all, that kind of path is not supported in MediaWiki core right not (only prefix, not suffice), though support could be added.

The problem here is this:

Cache older than the oldest wmfbranch we have can't access the resources anymore
We have cache older than the oldest wmfbranch.
Or.. we removed wmf branches before the oldest cache expired.

By changing the url structure, we mask 1 problem and add a new problem. It doesn't truly solve *any* problem

Files that haven't changed will appear to work, but:
files that have changed are either still a 404 error (if they were moved), or worse, if they're incompatible they'll break all sorts of stuff (by applying new styles to old content, or executing parts of a scripts etc.).

Even in the current system we occasionally get mis-match between versions causing headers to stick out of boxes and things to look ugly on the site for several weeks. We've had that and it wasn't fun.

Implementing something that by design will cause an unpredictable combination of version mismatches is unacceptable.

Our problem is relatively simple, so lets try to solve it without all kinds of weird workflow and infrastructure changes that introduce new problems.

I refer to comment 14, with additional thought that we may have to tighten up certain caches if we don't want to keep them around that long.

For the records: Problem brought to a wider audience: http://lists.wikimedia.org/pipermail/wikitech-l/2013-February/066770.html

https://wikitech.wikimedia.org/index.php?title=Server_admin_log&diff=56842&oldid=56841

Bringing back checkouts of MediaWiki 1.21wmf5 - 1.21wmf1 on fenari for [[bugzilla:44570|bug 44570]]

So who would be the person or persons to come up with a reasonable lifetime for the parser cache? I agree that a year is ridiculously long (see comment 16).

(In reply to comment #16)

'wgParserCacheExpireTime' => array(
'default' => 86400 * 365,
),

Should we keep files around for a year? I don't think so..

Proposal to shorten it to 30 days:
Change If7dad7f5a8 (author=reedy)

To get an insight into how far back the cache really goes in production (wgParserCacheExpireTime may not be the only factor), we should get statistics on hits (including 404 hits).

e.g. Any GET request for //bits.wikimedia.org/static-* in the last 30 days (number of hits by unique url, regardless of whether response was 200/304/404).

Then once we know what the oldest version is that we're still serving, we can periodically re-query this to see if our changes are helping to move up the cut off point.

So that change is now live. still a few things pending, recording them here so we don't forget.

really the cron purge job should have taken care of that. it's fine that the purge job and the parser cache expiry limit are in sync but with that change we now have a bunch of things that won't get purged by it, need to reset expiry on all items with > end of march to end of march, or some such.

there was an entry for foundationwiki:pcache:idhash:21087-0!*!0!!*!4!* with timestamp 20120919200207 but where was it? and how did it survive purges?

the footer on pages has a reference to the powered by mediawiki icon, which varies by version, and that breaks. it would be nice to handle things like this differently.

(In reply to comment #26)

So that change is now live.

I reverted it. I thought I was pretty clear that I didn't think it was a good idea.

(In reply to comment #27)

We have been running /usr/local/bin/mwscript purgeParserCache.php with --age=2592000 for a couple months now (see r38275), did you not agree with this? Should we be changing it?

Tim suggested that it didn't seem to be working..

mysql:wikiadmin@pc1001 [parsercache]> SELECT exptime FROM pc001 ORDER BY exptime ASC limit 1;
+---------------------+

exptime

+---------------------+

2013-03-01 03:35:55

+---------------------+
1 row in set (0.02 sec)

mysql:wikiadmin@pc1001 [parsercache]> SELECT exptime FROM pc001 ORDER BY exptime DESC limit 1;
+---------------------+

exptime

+---------------------+

2014-03-01 08:30:07

+---------------------+
1 row in set (0.03 sec)

(In reply to comment #28)

We have been running /usr/local/bin/mwscript purgeParserCache.php with
--age=2592000 for a couple months now (see r38275), [...]

You mean https://gerrit.wikimedia.org/r/38275, of course. :-)

(In reply to comment #30)
Er yes, I do. :-)

(In reply to comment #29)
Ok, if it's broken maybe we should look at that.

(In reply to comment #28)

(In reply to comment #27)

We have been running /usr/local/bin/mwscript purgeParserCache.php with
--age=2592000 for a couple months now (see r38275), did you not agree with
this? Should we be changing it?

I've reviewed the hit rate data on graphite. From the perspective of parser cache hit rate, the expiry time should probably be 2-3 months, but judging by the time between parser resets, we can't store much more than 1 month without running out of disk space.

In February 2012, we had an absent rate of only 2%, with an expired rate of 5%, after 6 months of fill time. We never achieved anything like that again, apparently because of disk space constraints. But with 1 month we should see something like 7% absent plus 2% expired. At least it's a big improvement over

http://tstarling.com/stuff/hit-rate-2011-03-25.png

Some data I gathered before the start of the MySQL parser cache project.

I don't think it's appropriate to set the parser cache expiry time based on the number of MW instances we can store on the cluster. The CPU cost of rewriting the bits URLs would be negligible compared to the CPU cost of reparsing the article from scratch. We don't want to have to increase our deployment period just to achieve a higher hit rate, and we don't want the deployment period to affect how much disk space we buy for the parser cache. There are plenty of ways to decouple the two.

Ultimately, I'd like to use an LRU expiry policy for the parser cache, instead of deleting objects based on creation time. That will make a decoupling between expiry time and MW deployment cycle even more necessary.

(In reply to comment #31)

(In reply to comment #30)
Er yes, I do. :-)

(In reply to comment #29)
Ok, if it's broken maybe we should look at that.

mysql:root@localhost [parsercache]> select date_format(exptime,'%Y-%m') as mo,count(*) from pc255 group by mo;
+---------+----------+

count(*)

+---------+----------+

2013-02	2144
2013-03	44279
2013-04	298564
2014-02	1156
2014-03	18231

+---------+----------+
5 rows in set (0.46 sec)

The objects expiring in 2013-02 are probably ones with "old magic", i.e. the parser overrides the expiry time to be 1 hour. The ones expiring in 2013-03 and 2013-04 would be the objects written in the last few days, with one-month expiries. The objects with expiries of 2014-02 and 2014-03 are from when the expiry time was 12 months -- they will not be deleted for 11 months due to the way purgeParserCache.php determines creation times. Just changing $wgParserCacheExpireTime causes purgeParserCache.php to stop purging things, because it makes those objects look like they were created in the future.

Interestingly, this also means that now that the parser cache expiry time is back to 12 months, purgeParserCache.php will purge 95% of the parser cache on Saturday night, as fast as the script can manage to do it. It won't even wait for replication. Not sure what sort of CPU spike that will make. I'll be in the air, do have fun with that.

The reason this change makes me so angry is because this critical site-wide parameter was changed without any kind of research being done into the possible consequences.

And of course, due to Timo creating all those checkouts again, we've no way of confirming that the fix has actually fixed anything, as users getting old pages (from 1.21 at least) will be able to the images, meaning they see no issue in the real world.

They're going to have to go again

(In reply to comment #35)

They're going to have to go again

Or just delete/break the symlinks

(In reply to comment #34)

Interestingly, this also means that now that the parser cache expiry time is
back to 12 months, purgeParserCache.php will purge 95% of the parser cache on
Saturday night, as fast as the script can manage to do it. It won't even wait
for replication. Not sure what sort of CPU spike that will make. I'll be in
the
air, do have fun with that.

I reduced the parser cache expiry again so that this won't happen.

Timo: Could we add some sort of JS workaround, that if a page loads and there's missing (specific) resources, the page get's purged and maybe even refreshed?

For reference:

(Reedy's original commit)
If7dad7f5a8b0081f1118941f4aa63e963986cf6a

(In reply to comment #27)

I reverted [...]

Ic453ad0a10a7189c0f3281c06f98227c57cbf81d

(In reply to comment #37)

I reduced the parser cache expiry again [...]

I61a706d931ff2e53108c082da88fa91b82ea1214

(In reply to comment #33)

+---------+----------+

mo count(*)

+---------+----------+

2013-02 2144

2013-03 44279

2013-04 298564

2014-02 1156

2014-03 18231

+---------+----------+

I'm trying to understand these results; we were running with he month-long setting for about a day and a half, yet the vast majority of the entries have these short expiries. I would have expected that most of the entries would have had long expiration dates, since nothing would have removed them What am I missing?

... Just changing
$wgParserCacheExpireTime causes purgeParserCache.php to stop purging things,
because it makes those objects look like they were created in the future.

Yes, the expiration times would have needed to be adjusted, see comment #26 first item. And I guess they do again, since we are back at on month expiry again.

I had a chat about this with Greg, Aaron, Peter and Asher.

We realized that we have pages in our cache that were generated more than 30 days ago, most likely due to a phenomenon I'm calling 304 extension. MediaWiki serves pages with a cache expiry of 30 days but with must-revalidate. This means that every time someone requests the page from Squid, Squid will issue an If-Modified-Since request to MediaWiki, and MediaWiki will respond with a 304 if the page hasn't been edited. This 304 also comes with a 30-day cache expiry, so the cache expiry timer now rewinds back to zero and starts counting to 30 days again. This way, a page that is never edited will never be recached, as long as it is requested at least once every 30 days. So our assumptions that the Squid cache turns over every 30 days is faulty, and there are pages that have been in the cache for longer than that. http://en.wikipedia.org/wiki/Wikipedia:No_climbing_the_Reichstag_dressed_as_Spider-Man is an example: visit it as an anon and you'll see <meta name="generator" content="MediaWiki 1.21wmf8"> and Last-Modified: Mon, 04 Feb 2013 18:34:48 GMT .

The suggested workaround for this issue is to modify MediaWiki such that it only sends a 304 when the If-Modified-Since timestamp is after the page_touched timestamp AND the If-Modified-Since timestamp is not more than 30 days ago. That way, Squid will do a full revalidation every 30 days, and we never have pages older than 30 days in the Squid cache.

Wait. I can not believe MediaWiki is sending pages with Cache-Control: max-age=2592000 (2592000 = 30 days in seconds).

I've accessed that page as anon and I got as a response:

Cache-Control=private, s-maxage=0, max-age=0, must-revalidate

And indeed I got the 1.21wmf8 meta tag. Since it has max-age=0, the browser stores a copy of the page on the cache, but every time that page is requested, a request is made to the server as Roan said (with the If-Modified-Since header).

Of course, that's the response from the squids, but I'm pretty sure MediaWiki also sends the max-age=0 to the squids. Sending a value as long as 30 days is bizarre and not desired under any circumstance for a page (it should be good for static resources, but not for anything dynamic).

When changing wmf branch, maybe we should update $wgCacheEpoch, and MediaWiki should send as a "Last-modified" header the min value between $wgCacheEpoch and page_touched (ideally, not only page_touched but page_touched of any page linked or transcluded from it).

(In reply to comment #42)

Wait. I can not believe MediaWiki is sending pages with Cache-Control:
max-age=2592000 (2592000 = 30 days in seconds).

I've accessed that page as anon and I got as a response:

Cache-Control=private, s-maxage=0, max-age=0, must-revalidate

And indeed I got the 1.21wmf8 meta tag. Since it has max-age=0, the browser
stores a copy of the page on the cache, but every time that page is
requested,
a request is made to the server as Roan said (with the If-Modified-Since
header).

Of course, that's the response from the squids, but I'm pretty sure MediaWiki
also sends the max-age=0 to the squids. Sending a value as long as 30 days is
bizarre and not desired under any circumstance for a page (it should be good
for static resources, but not for anything dynamic).

When changing wmf branch, maybe we should update $wgCacheEpoch, and MediaWiki
should send as a "Last-modified" header the min value between $wgCacheEpoch
and
page_touched (ideally, not only page_touched but page_touched of any page
linked or transcluded from it).

Squids get longer expires because we purge pages when they get edited. See the cache control logic in OutputPage.php

(In reply to comment #43)

(In reply to comment #42)

Wait. I can not believe MediaWiki is sending pages with Cache-Control:
max-age=2592000 (2592000 = 30 days in seconds).

I've accessed that page as anon and I got as a response:

Cache-Control=private, s-maxage=0, max-age=0, must-revalidate

That's what you saw, but you got that from the Squids. MediaWiki send s-maxage headers, and Squid obeys those then munges them to make sure no one else will cache the page downstream.

Squids get longer expires because we purge pages when they get edited. See
the
cache control logic in OutputPage.php

That's right. We send 30-day s-maxage headers and send explicit purges when pages are edited.

Per MW Core mtg today

Related URL: https://gerrit.wikimedia.org/r/58415 (Gerrit Change I3889f300012aeabd37e228653279ad19b296e4ae)

(In reply to comment #46)

Related URL: https://gerrit.wikimedia.org/r/58415

Aaron's three-liner patch is still awaiting review.

Related URL: https://gerrit.wikimedia.org/r/59414 (Gerrit Change I3889f300012aeabd37e228653279ad19b296e4ae)

(In reply to comment #48)

Related URL: https://gerrit.wikimedia.org/r/59414 (Gerrit Change
I3889f300012aeabd37e228653279ad19b296e4ae)

This will apply to all wikis next Wen.

Krinkle mentioned this in T102991: Verify traffic to static resources from past branches does indeed drain.Jun 18 2015, 6:36 PM

Krinkle mentioned this in T124954: Decrease max object TTL in varnishes.Jun 27 2016, 5:36 PM

Krinkle mentioned this in T140921: Reduce static asset time on disk from five trains' worth to two.Jul 20 2016, 5:44 PM

I don't know if this is the intended behavior, but with this change, if the page was last modified more than $wgSquidMaxage ago (5 hours in a default configuration), the Last-Modified field is updated *every second*, making that page effectively uncacheable, because every request with an If-Modified-Since will not match the Last-Modified. Sorry but I think this is broken behavior.

Wouldn't make more sense on WMF wikis to update $wgCacheEpoch on every deploy? I know, this would cause a cache stampede, and the current calculation of sepoch will mitigate that by delaying the cache expiration after 30 days (I guess this is the configuration of $wgSquidMaxage on WMF). But once we got 30 days without a given page being touched, the page will remain uncacheable until it's touched (modified, or a template /file used on the page is) again.

See T178629: Last-Modified header makes pages uncacheable in frontend caches when it was modified after $wgSquidMaxage where I propose a fix to this one.

matmarex unsubscribed.Nov 4 2017, 10:44 PM

Time prior to removal of old wmfbranch directories from cluster MUST be higher than longest cache of ANY kind; leads to missing resourcesClosed, ResolvedPublicActions