On the week of March 21, Wikimedia engineering is planning to fail over to CODFW for 48 hours. EventLogging needs to remain available, and there need to be instructions on Wikitech which document any manual steps (if any) required for shifting EventLogging to codfw.
Description
Details
Event Timeline
We need to verify that udp traffic can get from dallas to eqiad OR migrate eventlogging server to use kafka (as equiad will be up while this exercise is taken place)
UDP traffic can get from codfw to eqiad in general — the two DCs are interconnected (although keep in mind that the fibers may be wiretapped and thus no expectations of privacy should exist).
ACLs is something we should check, let me know of the specific flows you're referring so I can check that. Also, is this multicast? If it is, it /should/ work, but I'd like to test it as it's a little bit more complicated (PIM is involved).
As for "eqiad being up while this exercise takes place"… we have tentative plans of bringing it down in parts (e.g. row by row) to do network maintenance such as switch/router upgrades, so ideally we shouldn't count on that.
Traffic is not multicast, it is direct from app servers -> eventlog1001.
Hitting the beacon/event.gif endpoint from app servers is not a bad idea, and would simplify configuration and processes on the eventlog server side. It doesn't help with the codfw failover, but it does eliminate one more special case.
All varnishkafkas produce to the analytics-eqiad Kafka cluster. If that is not reachable, then we will lose all eventlogging and webrequest messages.
Change 273006 had a related patch set uploaded (by Ori.livneh):
Fully-qualify EventLoggingBaseUri
Change 273008 had a related patch set uploaded (by Ori.livneh):
Submit server-side events via HTTP POST to the beacon endpoint
Change 273513 had a related patch set uploaded (by Ottomata):
Add running eventlogging-devserver to role::eventlogging
Change 273527 had a related patch set uploaded (by Ottomata):
Update eventlogging-devserver log parsing to behave the same as eventlogging-processor
Change 273527 merged by Ottomata:
Update eventlogging-devserver log parsing to behave the same as eventlogging-processor
Change 273008 merged by jenkins-bot:
Submit server-side events via HTTP POST to the beacon endpoint
Hm, @ori, I was just testing this on MW vagrant with https://gerrit.wikimedia.org/r/#/c/273513/ again, and with role wikimediaevents and role eventlogging (from that patch), enabled, I get:
[error] [76308532] /w/index.php?title=Main_Page&action=submit ErrorException from line 63 of /vagrant/mediawiki/extensions/EventLogging/includes/EventLogging.php: PHP Notice: Undefined variable: wgDBname
In mediawiki-wiki-debug.log
Ah, I see it was removed in https://gerrit.wikimedia.org/r/#/c/273008/1/includes/EventLogging.php and then re-added in response to a comment in a later patch., but without the global declaration.
Fixing...
Change 274986 had a related patch set uploaded (by Ottomata):
Fix missing global var declaration for $wgDBname
Change 274986 merged by jenkins-bot:
Fix missing global var declaration for $wgDBname
Change 273513 merged by Ottomata:
Add eventlogging-devserver service to role::eventlogging
Change 452877 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[operations/mediawiki-config@master] Remove deprecated $wgEventLoggingFile config
Change 452877 merged by jenkins-bot:
[operations/mediawiki-config@master] Remove deprecated $wgEventLoggingFile config