Page MenuHomePhabricator

DNS Discovery operations diffs incosistent (post DC-Switch)
Closed, ResolvedPublic

Description

The nagios check for "DNS Discovery operations diffs" has been alerting since the DC switch over. The fix is likely simple however it is also worth investigating if the switch over process needs to be amended to include this change

Event Timeline

While making the change here i notice that some services which would normally be active-active seem to be only running in codfw e.g.

  • eventgate-analytics-external
  • shellbox
  • api-gateway
  • push-notifications

Further i notice that swift-rw is still active in eqiad

Change 703390 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] P:configmaster: update disc state to match post dc switch over state

https://gerrit.wikimedia.org/r/703390

Change 703390 merged by Jbond:

[operations/puppet@production] P:configmaster: update disc state to match post dc switch over state

https://gerrit.wikimedia.org/r/703390

I have fixed the check however we should add something to the playbook/cookbooks to catch this change in the future. @RLazarus/@Legoktm is that something for you?

Change 703396 had a related patch set uploaded (by Jbond; author: John Bond):

[operations/puppet@production] P:configmaster: update expected status for eventgate-external

https://gerrit.wikimedia.org/r/703396

Change 703396 merged by Jbond:

[operations/puppet@production] P:configmaster: update expected status for eventgate-external

https://gerrit.wikimedia.org/r/703396

Legoktm assigned this task to jbond.

Not sure how I missed the icinga alert, but thanks for the ping. I have documented this as a manual step that needs to be taken https://wikitech.wikimedia.org/w/index.php?title=Switch_Datacenter&type=revision&diff=1917853&oldid=1917612