Page MenuHomePhabricator

restrouter.svc.{eqiad,codfw}.wmnet in a failed state
Closed, ResolvedPublic

Description

Dropping the legacy Parsoid/JS tables in Cassandra (T242344) has put restrouter.svc.{eqiad,codfw}.wmnet in a failed state.

Since consensus about undeploying this was reached, actions required are below:

  • Remove icinga configuration
  • Remove LVS configuration
  • Undeploy the helm releases
  • Delete kubernetes namespaces
  • Delete kubernetes tokens

Event Timeline

Eevans triaged this task as Medium priority.Jan 10 2020, 8:38 PM
Eevans added a project: serviceops.
Eevans added subscribers: Pchelolo, akosiaris, WDoranWMF.

It's not clear to me what the status of this is. Do we need to deploy the latest code here? Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Is it possible to take it out for now until we either prioritize it again or drop it entirely?

We're running CI for RESTBase in both RESTBase and RESTRouter modes, so it will be in mostly deployable state if we want to put it back online, however maintaining an unused production deployment seems like a waste.

Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Is it possible to take it out for now until we either prioritize it again or drop it entirely?

You mean undeploy? Sure we can undeploy it. The only caveat being that redeploying it will take some time as we will need to create the necessary resources again (LVS entries, DNS, kubernetes namespaces etc).

We're running CI for RESTBase in both RESTBase and RESTRouter modes, so it will be in mostly deployable state if we want to put it back online, however maintaining an unused production deployment seems like a waste.

Indeed.

Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Is it possible to take it out for now until we either prioritize it again or drop it entirely?

You mean undeploy? Sure we can undeploy it. The only caveat being that redeploying it will take some time as we will need to create the necessary resources again (LVS entries, DNS, kubernetes namespaces etc).

We're running CI for RESTBase in both RESTBase and RESTRouter modes, so it will be in mostly deployable state if we want to put it back online, however maintaining an unused production deployment seems like a waste.

Indeed.

A lot has changed since we began this migration, including https://www.mediawiki.org/wiki/Core_Platform_Team/Decisions_Architecture_Research_Documentation/Services_Architecture_Recommendations_(2019), which is expected be a lengthly process, but will ultimately result in REST{Router,Base}-less world. I guess the question we should be asking is: Is this still something we should do in the meantime (and schedule and resource to complete), or should we cut bait, undeploy from k8s, and leave things as they are?

@WDoranWMF ?

Since (long-term) we aim to replace all of this, is abandoning it entirely an option?

Is it possible to take it out for now until we either prioritize it again or drop it entirely?

You mean undeploy? Sure we can undeploy it. The only caveat being that redeploying it will take some time as we will need to create the necessary resources again (LVS entries, DNS, kubernetes namespaces etc).

We're running CI for RESTBase in both RESTBase and RESTRouter modes, so it will be in mostly deployable state if we want to put it back online, however maintaining an unused production deployment seems like a waste.

Indeed.

A lot has changed since we began this migration, including https://www.mediawiki.org/wiki/Core_Platform_Team/Decisions_Architecture_Research_Documentation/Services_Architecture_Recommendations_(2019), which is expected be a lengthly process, but will ultimately result in REST{Router,Base}-less world. I guess the question we should be asking is: Is this still something we should do in the meantime (and schedule and resource to complete), or should we cut bait, undeploy from k8s, and leave things as they are?

@WDoranWMF ?

FWIW, this is generating some log noise as well.

I just acked two icinga LVS alerts for restrouter in icinga, please let me know if they were something different :)

I believe we have consensus around de-deploying restrouter from k8s, @WDoranWMF can you confirm?

@Eevans Sorry this got lost in my inbox, yep, I agree.

Change 573248 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: undeploy

https://gerrit.wikimedia.org/r/573248

Change 573249 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] restrouter: Fully remove the helmfile stanzas

https://gerrit.wikimedia.org/r/573249

Change 573250 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] admin: Remove calico restrouter rules

https://gerrit.wikimedia.org/r/573250

Change 573253 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Remove restrouter LVS icinga config

https://gerrit.wikimedia.org/r/573253

Change 573254 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Remove LVS configuration

https://gerrit.wikimedia.org/r/573254

Change 573255 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Remove from conftool

https://gerrit.wikimedia.org/r/573255

Change 573256 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Remove LVS IP from kubernetes

https://gerrit.wikimedia.org/r/573256

Change 573257 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Remove k8s tokens

https://gerrit.wikimedia.org/r/573257

Change 573253 merged by Alexandros Kosiaris:
[operations/puppet@production] restrouter: Remove restrouter LVS icinga config

https://gerrit.wikimedia.org/r/573253

Change 573254 merged by Alexandros Kosiaris:
[operations/puppet@production] restrouter: Remove LVS configuration

https://gerrit.wikimedia.org/r/573254

Change 573283 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] restrouter: Remove all records

https://gerrit.wikimedia.org/r/573283

Change 573283 merged by Alexandros Kosiaris:
[operations/dns@master] restrouter: Remove all records

https://gerrit.wikimedia.org/r/573283

Change 573256 merged by Alexandros Kosiaris:
[operations/puppet@production] restrouter: Remove LVS IP from kubernetes

https://gerrit.wikimedia.org/r/573256

Change 573255 merged by Alexandros Kosiaris:
[operations/puppet@production] restrouter: Remove from conftool

https://gerrit.wikimedia.org/r/573255

Change 573248 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: undeploy

https://gerrit.wikimedia.org/r/573248

Mentioned in SAL (#wikimedia-operations) [2020-03-17T11:16:32Z] <akosiaris> T242461 undeploy restrouter. Unused service and per task to not be used after all

Change 573249 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Fully remove the helmfile stanzas

https://gerrit.wikimedia.org/r/573249

Change 573250 merged by jenkins-bot:
[operations/deployment-charts@master] admin: Remove calico restrouter rules

https://gerrit.wikimedia.org/r/573250

Change 596141 had a related patch set uploaded (by JMeybohm; owner: JMeybohm):
[operations/deployment-charts@master] restrouter: Remove chart and namespace

https://gerrit.wikimedia.org/r/596141

Change 596141 merged by jenkins-bot:
[operations/deployment-charts@master] restrouter: Remove chart and namespace

https://gerrit.wikimedia.org/r/596141

Change 598047 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] restrouter: Cleanup some leftover hiera entries

https://gerrit.wikimedia.org/r/598047

Change 598047 merged by Alexandros Kosiaris:
[operations/puppet@production] restrouter: Cleanup some leftover hiera entries

https://gerrit.wikimedia.org/r/598047

Change 573257 abandoned by JMeybohm:
restrouter: Remove k8s tokens

Reason:
Nothing left here. Has been merged with I42cc007340914f4969ad2d369b77b01aca3f371e

https://gerrit.wikimedia.org/r/573257

JMeybohm updated the task description. (Show Details)