@Spage wrote:
E3 deployed several extensions with changed modules today. GettingStarted had a new RL module 'ext.gettingstarted.openTask' for a new JS file resources/ext.gettingstarted.openTask.js. scap finished 4:19PDT [00:19:44]
50 minutes after scap finished I visited a page that should have had the new code, but didn't.My hypothesis: during scap, some user requested a page that ran the new PHP code that adds the new module, so her browser made the request to bits.wikimedia.org for modules including the new module. At that time the bits server didn't yet have the JS file, so the server response reports "missing". But then RL happily caches that response for X minutes, where X is a long time, well over 50 minutes.
This might be a dupe of T39812 , "Module cache should be invalidated (newer timestamp) when lesser old items are removed or added (scripts, style, messages)"
It turned out the original report above was indeed T39812. However there is a larger issue here.
During deployment, any change that introduces use of a new URL that is deterministically versioned and has large max-age, has a risk of indefinitely polluting the cache.
The time line is as follows:
- Start deployment (scap, sync-file, sync-dir).
- Server A receives new code deployment.
- Client 1 request a page.
- Server A responds to client 1 with the page, build with the new code.
- Client 1 receives page from server A and requests any secondary resources (e.g. referenced script, style or image). – This resource has URIs like /static/file-name?{hash} or /load.php?module=name&{hash}.
- Server B responds to client 1 and provides the requested resource based on its name.
- Server B receives new code deployment.
Given we have Varnish in front of web servers, this means the outdated resource from server B is now cached at the url containing the new hash. As such, even once the deployment is over, it will continue to persist in cache indefinitely.
When adding new static files, server B would respond with 404 and it will repair itself eventually when the file exists. However for an update to static files or a change anywhere in JS/CSs modules, this can cause the Vanish cache to effectively renew old content under a new version URIs.
Related incident: https://wikitech.wikimedia.org/wiki/Incident_documentation/20160707-Echo