Page MenuHomePhabricator

recreate codfw cluster state from code stored in deployment-charts with helmfile [MIGHT CAUSE DOWNTIME]
Closed, ResolvedPublic

Description

if we want the cluster to be managed via code and helmfile we will need to recreate namespaces and deployments, the following services and namespaces will be affected.

Current plan is to do it all at once, deleting current namespaces and applying helmfile cluster wide. The list of following checkboxes are for double checking each service after recreation.

  • blubberoid
  • citoid
  • cxserver
  • eventgate-analytics
  • eventgate-main
  • mathoid
  • sessionstore
  • termbox
  • zotero

Details

Related Gerrit Patches:
operations/puppet : productionRevert "k8s, cache: disabling codfw services for k8s cluster recreation"
operations/deployment-charts : masterRevert Add resources stanza to prometheus-metrics-exporter
operations/deployment-charts : masterblubberoid/sessionstore: Bump requests/limits
operations/deployment-charts : masterFixup limitranges for citoid,cxserver
operations/deployment-charts : mastercodfw: Bump all LimitRanges and ResourceQuotas
operations/deployment-charts : masterstaging: Bump all LimitRanges and ResourceQuotas
operations/deployment-charts : mastermathoid: Partialy revert 269abb124130e0f
operations/puppet : productionk8s, cache: disabling codfw services for k8s cluster recreation
operations/deployment-charts : mastermathoid: Align limitranges/resourcequotas in staging
operations/deployment-charts : masterAdd resources stanza to prometheus-metrics-exporter
operations/deployment-charts : masterk8s, codfw: disabling quotas on some namespaces.
operations/deployment-charts : masterIncrease mathoid resourcequotas
operations/deployment-charts : masterk8s: deploy calico, rbac, psp, coredns and ns via helmfile in codfw.

Event Timeline

fsero created this task.Jul 24 2019, 8:22 AM
fsero updated the task description. (Show Details)Jul 24 2019, 8:36 AM
fsero triaged this task as High priority.Jul 25 2019, 9:25 AM
fsero moved this task from Backlog to Doing on the serviceops board.Aug 5 2019, 9:11 AM

Change 528078 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s: deploy calico, rbac, psp, coredns and ns via helmfile in codfw.

https://gerrit.wikimedia.org/r/528078

Change 528078 merged by Fsero:
[operations/deployment-charts@master] k8s: deploy calico, rbac, psp, coredns and ns via helmfile in codfw.

https://gerrit.wikimedia.org/r/528078

Mentioned in SAL (#wikimedia-operations) [2019-08-05T13:42:06Z] <fsero> deploying tiller in kube-system for helmfile changes - T228837

Mentioned in SAL (#wikimedia-operations) [2019-08-05T13:56:55Z] <fsero> deploying calico controller in codfw via helmfile - T228837

Change 528164 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/puppet@production] k8s, cache: disabling codfw services for k8s cluster recreation

https://gerrit.wikimedia.org/r/528164

Mentioned in SAL (#wikimedia-operations) [2019-08-05T15:26:09Z] <fsero> recreating zotero and termbox from helmfile codfw - T228837

Mentioned in SAL (#wikimedia-operations) [2019-08-05T15:27:10Z] <fsero> recreating zotero and termbox namespaces and services from helmfile codfw - T228837

Change 528164 merged by Fsero:
[operations/puppet@production] k8s, cache: disabling codfw services for k8s cluster recreation

https://gerrit.wikimedia.org/r/528164

Mentioned in SAL (#wikimedia-operations) [2019-08-05T16:10:46Z] <fsero> recreating citoid eventgate-analytics eventgate-main mathoid sessionstore namespaces and redeploying from helmfile T228837

Change 528196 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Increase mathoid resourcequotas

https://gerrit.wikimedia.org/r/528196

Change 528196 merged by Fsero:
[operations/deployment-charts@master] Increase mathoid resourcequotas

https://gerrit.wikimedia.org/r/528196

Change 528202 had a related patch set uploaded (by Fsero; owner: Fsero):
[operations/deployment-charts@master] k8s, codfw: disabling quotas on eventgate, cxserver and mathoid as they need more work

https://gerrit.wikimedia.org/r/528202

Change 528202 merged by Fsero:
[operations/deployment-charts@master] k8s, codfw: disabling quotas on some namespaces.

https://gerrit.wikimedia.org/r/528202

Change 528390 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528390

Change 528390 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528390

Change 528401 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] mathoid: Align limitranges/resourcequotas in staging

https://gerrit.wikimedia.org/r/528401

Change 528401 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] mathoid: Align limitranges/resourcequotas in staging

https://gerrit.wikimedia.org/r/528401

Change 528404 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] mathoid: Partialy revert 269abb124130e0f

https://gerrit.wikimedia.org/r/528404

Change 528409 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Revert "k8s, cache: disabling codfw services for k8s cluster recreation"

https://gerrit.wikimedia.org/r/528409

Change 528404 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] mathoid: Partialy revert 269abb124130e0f

https://gerrit.wikimedia.org/r/528404

Change 528494 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] staging: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528494

Change 528495 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] codfw: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528495

Change 528494 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] staging: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528494

Change 528495 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] codfw: Bump all LimitRanges and ResourceQuotas

https://gerrit.wikimedia.org/r/528495

Change 528508 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Fixup limitranges for citoid,cxserver

https://gerrit.wikimedia.org/r/528508

Change 528508 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Fixup limitranges for citoid,cxserver

https://gerrit.wikimedia.org/r/528508

Change 528513 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] blubberoid/sessionstore: Bump requests/limits

https://gerrit.wikimedia.org/r/528513

Change 528515 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/deployment-charts@master] Revert Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528515

Change 528513 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] blubberoid/sessionstore: Bump requests/limits

https://gerrit.wikimedia.org/r/528513

Change 528515 merged by Alexandros Kosiaris:
[operations/deployment-charts@master] Revert Add resources stanza to prometheus-metrics-exporter

https://gerrit.wikimedia.org/r/528515

Change 528409 merged by Fsero:
[operations/puppet@production] Revert "k8s, cache: disabling codfw services for k8s cluster recreation"

https://gerrit.wikimedia.org/r/528409

Joe closed this task as Resolved.Sep 11 2019, 7:13 AM