During the Morning SWAT, @Ottomata deployed https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/492770/. @Ottomata thought that this patch did not depend on wmf.19 being deployed, since we had last week deployed support for multi instance EventBus instance via the `EventServices` config. This config has already been deployed, and so the EventBus extension should not have been using the legacy `wgEventServiceUrl` config. Apparently it was somehow. During the time between the morning SWAT and the final push of wmf.19 to group2 wikis, many app servers (probably all those in group2) failed to produce to EventBus due to misconfiguration:
https://logstash.wikimedia.org/goto/a1781b78f692e030d2e775d08572ca9f\
https://grafana.wikimedia.org/d/000000102/production-logging?orgId=1&from=1551378582537&to=1551388309667&panelId=13&fullscreen&edit
https://grafana.wikimedia.org/d/000000201/eventbus?from=1551369945820&to=1551391545820&orgId=1&var-site=eqiad&var-rule=All
I've been able to capture the failed events from the logstash error data in Kafka
```
lang=bash
$ kafkacat -C -b localhost:9092 -t udp_localhost-err -o 12688179 | grep '"channel": "EventBus"' > eventbus-outage-logstash.2019-02-28.json
$ cat eventbus-outage-logstash.2019-02-28.json | jq -rc .events[0] > eventbus-outage-events.2019-02-28.json
$ wc -l eventbus-outage-events.2019-02-28.json
269995 eventbus-outage-events.2019-02-28.json
```
Counting by topic:
```
147405 "mediawiki.job.wikibase-addUsagesForPage"
37177 "mediawiki.job.RecordLintJob"
17266 "mediawiki.job.cirrusSearchLinksUpdatePrioritized"
12189 "mediawiki.job.cirrusSearchLinksUpdate"
8831 "resource_change"
5848 "mediawiki.job.cdnPurge"
5420 "mediawiki.job.refreshLinksPrioritized"
5224 "mediawiki.job.htmlCacheUpdate"
4897 "mediawiki.job.recentChangesUpdate"
4419 "mediawiki.job.categoryMembershipChange"
4277 "mediawiki.revision-create"
2958 "mediawiki.page-links-change"
2444 "mediawiki.job.ORESFetchScoreJob"
2351 "mediawiki.job.EchoNotificationDeleteJob"
2146 null
1288 "mediawiki.revision-tags-change"
1285 "mediawiki.job.refreshLinks"
1281 "mediawiki.job.enotifNotify"
882 "mediawiki.job.flaggedrevs_CacheUpdate"
637 "mediawiki.job.wikibase-InjectRCRecords"
535 "mediawiki.page-properties-change"
325 "mediawiki.job.cirrusSearchIncomingLinkCount"
264 "mediawiki.page-create"
164 "mediawiki.job.CentralAuthCreateLocalAccountJob"
135 "mediawiki.job.LoginNotifyChecks"
126 "mediawiki.user-blocks-change"
51 "mediawiki.job.cirrusSearchDeletePages"
43 "mediawiki.page-delete"
33 "mediawiki.page-move"
32 "mediawiki.job.UpdateRepoOnMove"
31 "mediawiki.job.updateBetaFeaturesUserCounts"
6 "mediawiki.job.UpdateRepoOnDelete"
5 "mediawiki.job.ThumbnailRender"
5 "mediawiki.job.cirrusSearchCheckerJob"
4 "mediawiki.page-restrictions-change"
4 "mediawiki.job.compileArticleMetadata"
3 "mediawiki.revision-visibility-change"
2 "mediawiki.job.userGroupExpiry"
1 "mediawiki.job.cirrusSearchOtherIndex"
1 "mediawiki.job.cirrusSearchDeleteArchive"
```
I could replay these events back to EventBus...but I'm not sure that I should!