I just noticed when browsing some of our staff's user pages on wikimediafoundation.org that the search icon on the top left is not appearing for several user pages. This was because the we removed the wmfbranch directories on the server before the last cache expired.
For example the search-magnify icon is referred to in the html via the static-{wmfbranch} path on the "bits" server.
via https://bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4
Reproduce
- log out
- clear cookies just logging out apparently still leaves 3 cookies which cause some part of the cluster to serve // a new version instead >>> bug?
- page last modified before December 1, 2012
- current date after January 30, 2013
- visit one of:
Request
Request URL: https://wikimediafoundation.org/wiki/User:Gyoung
Request Method: GET
Status Code: 200 OK
Request Headers
GET /wiki/User:Gyoung HTTP/1.1
Host: wikimediafoundation.org
Connection: keep-alive
Cache-Control: no-cache
Pragma: no-cache
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.101 Safari/537.11
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Response Headers
HTTP/1.1 200 OK
Server: nginx/1.1.19
Date: Fri, 01 Feb 2013 04:47:40 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 7118
Connection: keep-alive
X-Content-Type-Options: nosniff
Content-Language: en
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=foundationwikiToken;string-contains=foundationwikiLoggedOut;string-contains=foundationwiki_session;string-contains=mf_useformat
Last-Modified: Wed, 19 Sep 2012 20:02:07 GMT
Content-Encoding: gzip
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Cache-Control: private, must-revalidate, max-age=0
Vary: Accept-Encoding,Cookie
X-Cache: HIT from sq72.wikimedia.org
X-Cache-Lookup: HIT from sq72.wikimedia.org:3128
X-Cache: MISS from sq64.wikimedia.org
X-Cache-Lookup: HIT from sq64.wikimedia.org:80
Via: 1.1 sq72.wikimedia.org:3128 (squid/2.7.STABLE9), 1.0 sq64.wikimedia.org:80 (squid/2.7.STABLE9)
Response
<!DOCTYPE html>
<html>
<meta name="generator" content="MediaWiki 1.21wmf1">
..
<div id="mw-content-text" ..>
..
<!--
NewPP limit report
Preprocessor visited node count: 62/1000000
Preprocessor generated node count: 349/1000000
Post-expand include size: 4937/2048000 bytes
Template argument size: 1975/2048000 bytes
Highest expansion depth: 4/40
Expensive parser function count: 0/500
-->
<!-- Saved in parser cache with key foundationwiki:pcache:idhash:21087-0!*!0!!*!4!* and timestamp 20120919200207 -->
..
</div>
..
<img src="//bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4" alt="Search" width="12" height="13">
..
<!-- Served by srv231 in 0.140 secs. -->
..
</html>
Errors
404 (Not Found)
GET https://bits.wikimedia.org/static-1.21wmf1/skins/vector/images/search-ltr.png?303-4
404 (Not Found)
GET https://bits.wikimedia.org/static-1.21wmf1/skins/common/images/poweredby_mediawiki_88x31.png
So, to conclude. These paths can be in the database, memcached, squid, varnish, whatever the case. If some component somewhere is not modified (modules, files, wiki pages, configuration, epoch, whatever it is) it may be cached by one of the caches somewhere, which means we must be sure to never remove publicly exposed paths before the longest cache is expired.
Marking as regression as this is a regression from the het deploy process.
We just need to make sure that we don't perform the teardown of an iteration until the longest cache is expired.
This can be documented and hoped that everyone will remember, but though it is only a small image this time, it can cause more significant and visual damages other times. The principle is the same, so let's not find out the hard way but be smart about it.
If I recall correctly there is a maintenance script in multiversion that removes the paths and symlinks (essentially the teardown opposite of bin/checkoutMediaWiki (bin/deleteMediaWiki)[1]). I propose we add some logic there that determines how old a branch is (commit date of first commit in the branch deriving from master) and ensure that it is
older than (current time) - CACHE_MAX_MAX_AGE + CACHE_HERE-BE-DRAGONS_MARGIN
These constants can be hardcoded in the script since there is no realistically feasible way to determine the maximum max age of all caching layers we have. From guess I'd say that max max age is 31 days and margin of 7 days.
If the condition is false, the shell user is NOT allowed to execute the script further.
[1]
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=tree
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=blob;f=checkoutMediaWiki;h=677f17d0121743ed4b94bfc259d4b46255edc0ce;hb=HEAD
https://gerrit.wikimedia.org/r/gitweb?p=operations/mediawiki-multiversion.git;a=blob;f=deleteMediaWiki;h=b90bf0c0a7b4687a880d077dcfab360e3add5949;hb=HEAD
Version: unspecified
Severity: major