During 2022-06-30's UTC late deployment, a config patch was synced:
[thcipriani@deploy1002 mediawiki-staging (master u=)]$ scap sync-file wmf-config/InitialiseSettings.php 'Config: [[gerrit:809165|Enable Wikistories on idwiki (T311143)]]' ... 20:55:35 Started sync-masters sync-masters: 100% (in-flight: 0; ok: 1; fail: 0; left: 0) sync-pull-masters: 100% (in-flight: 0; ok: 1; fail: 0; left: 0) sync-testservers: 100% (in-flight: 0; ok: 4; fail: 0; left: 0) sync-canaries: 100% (in-flight: 0; ok: 9; fail: 0; left: 0) 20:55:46 Running '/usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807' on 9 host(s) [canaries] sync-proxies: 100% (in-flight: 0; ok: 8; fail: 0; left: 0) sync-apaches: 100% (in-flight: 0; ok: 348; fail: 0; left: 0) 20:56:36 Running '/usr/local/sbin/check-and-restart-php php7.2-fpm 9223372036854775807' on 307 host(s) 20:59:05 Finished php-fpm-restarts (duration: 02m 29s) 20:59:05 Synchronized wmf-config/InitialiseSettings.php: Config: [[gerrit:809165|Enable Wikistories on idwiki (T311143)]] (duration: 03m 31s)
(full transcript is in P30698)
However, a couple of minutes later, it was observed the change isn't fully live:
21:12 <stephanebisson> thcipriani somehow my patch doesn't seem to be sync'd everywhere. When I refresh, sometimes the code is there sometimes it isn't. Is there a long replication delay or could there be a problem?
This affected at least mw1414 and mw1369:
21:16 <urbanecm> stephanebisson: would you mind sharing which mw server works correctly and which one doesn't? should be available in the `server` header in your devtools 21:18 <stephanebisson> urbanecm mw1414.eqiad.wmnet (not updated) 21:18 <stephanebisson> urbanecm mw1413.eqiad.wmnet (updated) 21:20 <stephanebisson> urbanecm mw1369 also not up to date
State | Hosts |
good | mw1413 |
bad | mw1414, mw1369 |
For the record mw1414 is an appserver canary, but mw1369 and mw1413 are not canaries.
The code on all of those servers is the same:
urbanecm@notebook ~ $ ssh mw1413.eqiad.wmnet md5sum /srv/mediawiki/wmf-config/InitialiseSettings.php cab73e4a083586b3f97260f3634a1414 /srv/mediawiki/wmf-config/InitialiseSettings.php urbanecm@notebook ~ $ ssh mw1414.eqiad.wmnet md5sum /srv/mediawiki/wmf-config/InitialiseSettings.php cab73e4a083586b3f97260f3634a1414 /srv/mediawiki/wmf-config/InitialiseSettings.php urbanecm@notebook ~ $ ssh mw1369.eqiad.wmnet md5sum /srv/mediawiki/wmf-config/InitialiseSettings.php cab73e4a083586b3f97260f3634a1414 /srv/mediawiki/wmf-config/InitialiseSettings.php urbanecm@notebook ~ $
This is likely caused by some issue in the PHP-fpm restarts (either the restart command not arriving at all necessary hosts, or the restart script not working under certain circumstances).
The same issue also happened on 2022-06-29 (see logs from https://wm-bot.wmcloud.org/browser/index.php?start=06%2F29%2F2022&end=06%2F29%2F2022&display=%23wikimedia-operations, search for dancy and MatmaRex after 14:35 on that day).