Page MenuHomePhabricator

recreate codfw cluster state from code stored in deployment-charts with helmfile [MIGHT CAUSE DOWNTIME]
Open, HighPublic

Description

if we want the cluster to be managed via code and helmfile we will need to recreate namespaces and deployments, the following services and namespaces will be affected.

Current plan is to do it all at once, deleting current namespaces and applying helmfile cluster wide. The list of following checkboxes are for double checking each service after recreation.

  • blubberoid
  • citoid
  • cxserver
  • eventgate-analytics
  • eventgate-main
  • mathoid
  • sessionstore
  • termbox
  • zotero

Event Timeline

fsero created this task.Wed, Jul 24, 8:22 AM
fsero updated the task description. (Show Details)Wed, Jul 24, 8:36 AM
fsero triaged this task as High priority.Thu, Jul 25, 9:25 AM
fsero moved this task from Backlog to Doing on the serviceops board.Mon, Aug 5, 9:11 AM

Change 528078 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s: deploy calico, rbac, psp, coredns and ns via helmfile in codfw.

https://gerrit.wikimedia.org/r/528078

Change 528078 merged by Fsero:
[operations/deployment-charts@master] k8s: deploy calico, rbac, psp, coredns and ns via helmfile in codfw.

https://gerrit.wikimedia.org/r/528078

Mentioned in SAL (#wikimedia-operations) [2019-08-05T13:42:06Z] <fsero> deploying tiller in kube-system for helmfile changes - T228837

Mentioned in SAL (#wikimedia-operations) [2019-08-05T13:56:55Z] <fsero> deploying calico controller in codfw via helmfile - T228837

Change 528164 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] k8s, cache: disabling codfw services for k8s cluster recreation

https://gerrit.wikimedia.org/r/528164

Mentioned in SAL (#wikimedia-operations) [2019-08-05T15:26:09Z] <fsero> recreating zotero and termbox from helmfile codfw - T228837

Mentioned in SAL (#wikimedia-operations) [2019-08-05T15:27:10Z] <fsero> recreating zotero and termbox namespaces and services from helmfile codfw - T228837

Change 528164 merged by Fsero:
[operations/puppet@production] k8s, cache: disabling codfw services for k8s cluster recreation

https://gerrit.wikimedia.org/r/528164

Mentioned in SAL (#wikimedia-operations) [2019-08-05T16:10:46Z] <fsero> recreating citoid eventgate-analytics eventgate-main mathoid sessionstore namespaces and redeploying from helmfile T228837

Change 528196 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Increase mathoid resourcequotas

https://gerrit.wikimedia.org/r/528196

Change 528196 merged by Fsero:
[operations/deployment-charts@master] Increase mathoid resourcequotas

https://gerrit.wikimedia.org/r/528196

Change 528202 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s, codfw: disabling quotas on eventgate, cxserver and mathoid as they need more work

https://gerrit.wikimedia.org/r/528202

Change 528202 merged by Fsero:
[operations/deployment-charts@master] k8s, codfw: disabling quotas on some namespaces.

https://gerrit.wikimedia.org/r/528202

Change 528390 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528390

Change 528390 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528390

Change 528401 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] mathoid: Align limitranges/resourcequotas in staging

https://gerrit.wikimedia.org/r/528401

Change 528401 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] mathoid: Align limitranges/resourcequotas in staging

https://gerrit.wikimedia.org/r/528401

Change 528404 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] mathoid: Partialy revert 269abb124130e0f

https://gerrit.wikimedia.org/r/528404

Change 528409 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Revert "k8s, cache: disabling codfw services for k8s cluster recreation"

https://gerrit.wikimedia.org/r/528409

Change 528404 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] mathoid: Partialy revert 269abb124130e0f

https://gerrit.wikimedia.org/r/528404

Change 528494 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] staging: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528494

Change 528495 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] codfw: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528495

Change 528494 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] staging: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528494

Change 528495 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] codfw: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528495

Change 528508 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Fixup limitranges for citoid,cxserver

https://gerrit.wikimedia.org/r/528508

Change 528508 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Fixup limitranges for citoid,cxserver

https://gerrit.wikimedia.org/r/528508

Change 528513 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] blubberoid/sessionstore: Bump requests/limits

https://gerrit.wikimedia.org/r/528513

Change 528515 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Revert Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528515

Change 528513 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] blubberoid/sessionstore: Bump requests/limits

https://gerrit.wikimedia.org/r/528513

Change 528515 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Revert Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528515

Change 528409 merged by Fsero:
[operations/puppet@production] Revert "k8s, cache: disabling codfw services for k8s cluster recreation"

https://gerrit.wikimedia.org/r/528409