Wed, Sep 11
I’m on vacation starting yesterday, so won’t be able to answer everything,
but: errors began intermittently appearing Saturday. Source IP is the A
record that icinga.wikimedia.org resolves to
Mon, Sep 9
Yeah, all fair points. We don't seem to be experiencing too many of these
404s (a handful per day), and other mitigations are available, so it's
probably not worth it.
@Varnent Can you pass on this report to Auttomatic?
Leaving some notes here before I'm gone for two weeks.
Sun, Sep 8
I wound up implementing a simpler UI for this, as the code was a lot less annoying than adding a --set-note flag to each subcommand.
Sat, Sep 7
Fri, Sep 6
Wed, Sep 4
Tue, Sep 3
Mon, Sep 2
My thought with wanting to build templates into dbctl was mostly that it would mean dbctl's output would make it obvious what was happening, without needing to also look at the associated Mediawiki configs. Also I didn't want to add more PHP glue on the mediawiki-config side, because the status quo is already a bit gross.
Fri, Aug 30
The remaining work on this task is now part of T231642: Empty db-eqiad.php, db-codfw.php s1-s8+wikitech lines
Seems likely this is the same as T231504: Unexpectedly received mobile version of an article while logged out ?
In db-eqiad/codfw.php we currently provide IP addresses as the values of the keys in externalLoads instead of using hostsByName to translate hostnames to IP, e.g.:
'externalLoads' => [ # es2 'cluster24' => [ '10.64.32.184' => 0, # es1015, C2 11TB 128GB, master '10.64.0.6' => 1, # es1011, A2 11TB 128GB '10.64.16.186' => 1, # es1013, B1 11TB 128GB ], ]
There's a --batch flag, intended for use from scripts, that I think should work for this?
There's no blocker here I know of aside from needing to be careful about it -- I haven't found the time yet to do this.
Wed, Aug 28
Fortunately this occurrence seems to be quite rare.
Seeing this again in production on enwiki as of about 15:15 UTC today https://logstash.wikimedia.org/goto/971e0dc9c8f3d9cfe4d76c30a6446a9b
Tue, Aug 27
Mon, Aug 26
Per @Anomie's comment this isn't caused by any change in Mediawiki code, and thus IMO shouldn't be a train blocker?
That's right -- this just enforces total number of replicas pooled at the top level of a section.
Sat, Aug 24
Thanks for this report!
Almost certainly due to codfw network troubles at the time https://wikitech.wikimedia.org/wiki/Incident_documentation/20190823-network_codfw
Aug 23 2019
@Der_Keks yes, that is the purpose of the aforementioned swiftrepl daemon.
BTW for posterity, here's how I looked for logs:
I did some digging in the swift logs around the time the file was uploaded; there's no record of swift in codfw ever receiving a PUT for this file.
Indeed, the object exists on eqiad, but never made it to codfw swift:
Looks like we are indeed serving 404s for this object from codfw and all points west:
Judging from logstash, this seems to only occur on HHVM. Likely, PHP7 is writing some things to memcache that HHVM can't unserialize.
Aug 22 2019
17:44:35 <volans> cdanis: that was the switcdc repo that become the basis to write spicerack, now lives as cookbooks in the cookbooks repo 17:44:44 <volans> but 17:44:58 <volans> it doesn't use anymore the readonlybysection IIRC and uses the global RO instead
Aug 20 2019
https://noc.wikimedia.org/db.php will now stay up-to-date with dbctl changes.
Ahhh, thanks! Those lines ending in GET explains the T; that makes it seem very likely that icinga-wm is splitting the lines internally, but has a different/wrong idea of the max line length than the IRC server does.
Aug 19 2019
Aug 16 2019
Oh, also, please note that list administrators are not automatically subscribed to the list -- subscribe yourselves if you want to receive posts.
List created! @Manik87 you should have received an email with your administrator password for the mailing list.