Page MenuHomePhabricator

Pool eventgate-main in both datacenters (active/active)
Closed, ResolvedPublic

Description

Follow-up from https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-25_eventgate-main_outage

eventgate-main has been only pooled in the active DC because WDQS lag detection required all topics to start with the DC's name, see T285710: WDQS lag detection required manual adjustment during DC switchover for the details.

However, the WDQS updater no longer uses that, I believe, so it should be safe to repool eventgate-main as active-active. We should give people advance notice just in case someone else has started to depend upon the topic name.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@dcausse the $kafka_reporting_topic variable is still in puppet (https://gerrit.wikimedia.org/g/operations/puppet/+/97421d5ac1f1a5082cc8f42b02bf80dc017e7497/modules/profile/manifests/query_service/updater.pp#14) - can you confirm that it's no longer used?

Indeed it is no longer used for production wdqs machines. We kept wdqs1010 (a test machine) running the old updater in case we had to rollback. I think you can consider it is no longer used and move forward with event-gate being active/active.
I prepared a patch to switch wdqs1010 to not rely on this puppet code (https://gerrit.wikimedia.org/r/c/operations/puppet/+/742670) in case it annoys us with lag alerts, but I think wdqs test machines have the lag alert disabled.

Ack, thanks!

I sent a heads-up to the ops list about this just in case some other application has started assuming it being pooled in only a single DC. If no one says anything, I think we can do this in a few days.

herron triaged this task as Medium priority.Dec 3 2021, 7:08 PM

Mentioned in SAL (#wikimedia-operations) [2021-12-14T18:25:31Z] <ottomata> repooling eventgate-main discovery to include codfw - T296699 - confctl --object-type discovery select 'dnsdisc=eventgate-main,name=codfw' set/pooled=true

Ran

root@puppetmaster1001:~# confctl --object-type discovery select 'dnsdisc=eventgate-main,name=codfw' set/pooled=true
eventgate-main/codfw: pooled changed False => True

root@puppetmaster1001:~# confctl --object-type discovery select 'dnsdisc=eventgate-main' get
{"codfw": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=eventgate-main"}
{"eqiad": {"pooled": true, "references": [], "ttl": 300}, "tags": "dnsdisc=eventgate-main"}
Ottomata claimed this task.

Yup should be!