Page MenuHomePhabricator

Services without a service IP cannot automatically be switched by the switchdc cookbook
Open, MediumPublic

Description

During the DC switchover today, the helm-charts service failed verification causing the cookbook to abort because it doesn't have a service IP

Either:

  • the cookbook/spicerack should support services without a service IP
  • all services must have a service IP, whether it's otherwise technically needed or not

Event Timeline

helm-charts not having a service IP was a design decision because we only have (and need) one replica per DC. IIRC there are already other "services" following this approach (@Dzahn, maybe) and I think we should support it.

helm-charts not having a service IP was a design decision because we only have (and need) one replica per DC. IIRC there are already other "services" following this approach (@Dzahn, maybe) and I think we should support it.

In order to do that, we'd need to change the logic of the dns module in spicerack quite a bit. I think for the switchback we can just add it to the excluded services.

I guess that the important part here is to make an explicit decision if the setup for heml-charts (discovery record without svc records) is just an exception or an additional standard way to setup things that we want to support.
Based on that we should either fix helm-charts to not be an exception or support the new use case properly.

It would also be nice if the cookbook could check all services, and then fail if at least one didn't verify properly. Right now it just aborts at the first failure, which gives you no indication about the services that came after it in the list.

Legoktm triaged this task as High priority.Jul 27 2021, 6:09 PM

Change 710235 had a related patch set uploaded (by Legoktm; author: Legoktm):

[operations/cookbooks@master] sre.switchdc.services: Exclude helm-charts, lacking a service IP

https://gerrit.wikimedia.org/r/710235

Change 710235 merged by jenkins-bot:

[operations/cookbooks@master] sre.switchdc.services: Exclude helm-charts, lacking a service IP

https://gerrit.wikimedia.org/r/710235

Legoktm renamed this task from During DC switch, helm-charts failed verification because it doesn't have a service IP to Services without a service IP cannot automatically be switched by the switchdc cookbook.Aug 10 2021, 6:54 PM
Legoktm lowered the priority of this task from High to Medium.
Legoktm updated the task description. (Show Details)