= eqiad row C switches upgrade =
For reasons detailed in {T327248} we're going to upgrade eqiad row C switches during the scheduled DC switchover.
**Scheduled on April 4th - 13:00-15:00 UTC** , please let us know if there is any issue with the scheduled time.
It means a !!30min hard downtime!! for the whole row if everything goes well (well, 15min in real-reality). Also a good opportunity to test the hosts depool mechanisms and row redundancy of services.
The list of impacted servers and teams for this row is listed below.
The actions needed is quite free form:
* please write `NONE` if no action is needed,
* the cookbook/command to run if it can be done by a 3rd party
* who will be around to take care of the depool
* Link to the relevant doc
* etc
The two main types of actions needed are depool and monitoring downtime
NOTE: If the servers can handle a longer depool, it's preferred to depool them many hours or the day before (and mark `None` in the table) so there are less moving parts closer to the maintenance window.
All servers will be downtimed with `sudo cookbook sre.hosts.downtime --hours 2 -r "eqiad row C upgrade" -t XXX 'P{P:netbox::host%location ~ "B.*eqiad"}'` but specific services might need specific downtimes.
== Observability ==
#sre_observability
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|alert1001| fail services over to alert2001 | fail services back to alert1001 | incomplete |
|kafka-logging1002| schedule downtime | | incomplete |
|logstash[1025,1028,1034]| drain shards 1028,1034 depool 1025 & set downtime | | incomplete |
|mwlog1002| schedule downtime, deploy [[ https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/901322 | MW patch ]] | revert [[ https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/901322 | MW patch ]] | incomplete |
|webperf1003| none | none | |
== Observability and Data Persistence ==
#sre_observability #data-persistence
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|thanos-fe1003| | | |
== Core Platform ==
#core-platform-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|dumpsdata[1003,1005]| | | |
|maps1009| | | |
|sessionstore1002| | | |
|snapshot1014| | | |
|thumbor1006| | | |
== ServiceOps-Collab ==
#serviceops-collab
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|doc1002| | | |
|etherpad1003| | | |
|gitlab-runner1003| | | |
|miscweb1002| | | |
== Search Platform ==
#discovery-search
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|cloudelastic1003| | | |
|elastic[1057-1059,1080-1083,1087-1088]| | | |
|wcqs1003| | | |
|wdqs[1010,1013-1014]| | | |
== Data Engineering ==
#data-engineering
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|an-conf1002| | | |
|an-coord1002| | | |
|an-db1002| | | |
|an-druid1002| | | |
|an-test-master1002| | | |
|an-test-worker1002| | | |
|an-tool[1005,1007,1010]| | | |
|an-worker[1088-1091,1099-1100,1104-1111,1131-1133]| | | |
|analytics[1064-1066,1074-1075]| | | |
|aqs[1012-1013,1018]| | | |
|datahubsearch1003| | | |
|db1108| | | |
|dbstore1005| | | |
|kafka-jumbo[1004-1005,1007]| | | |
|matomo1002| | | |
== Data Engineering and Machine Learning ==
#data-engineering #machine-learning-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|dse-k8s-etcd1003| none | none | none |
|dse-k8s-worker1003| none | none | none |
== Infrastructure Foundations ==
#infrastructure-foundations
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|cumin1001|NONE |NONE | |
|ganeti[1009-1012,1024,1027-1028]|NONE |NONE | |
|idp-test1002|NONE |NONE | |
|install1004|NONE |NONE | |
|mx1001|NONE |NONE | |
|puppetdb1002| | | |
|puppetmaster1005| | | |
|rpki1001| | | |
|seaborgium|NONE |NONE | |
|urldownloader[1002-1003]| | | |
== Traffic ==
#traffic
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|acmechief1001| | | |
|acmechief-test1001| | | |
|cp[1083-1086]| | | |
|doh1001| | | |
|lvs[1015,1019]| | | |
|ncredir1001| | | |
== Machine Learning ==
#machine-learning-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|ml-cache1002| none | none | none |
|ml-etcd1002| none | none | none |
|ml-serve1003| none | none | none |
|ores[1005-1006]| sudo -i depool | sudo -i pool | |
|orespoolcounter1004| sudo -i depool| sudo -i pool | |
== Data Persistence ==
#data-persistence
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|backup[1002,1006]| | | |
|db[1100-1101,1110,1120-1121,1131,1133-1135,1145-1147,1150,1166-1171,1180-1181,1189]| | |db1101 will needed to be failed over (T333123) as it is going to become m1 master as part of T331510 to allow row B maintenance |
|dbprov1003| | | |
|dbproxy[1020-1021]| | |Nothing to be done, they are not active at the moment |
|es[1022,1031-1032]| | |Nothing to be done as eqiad will be depooled |
|moss-be1002| | | |
|ms-backup1002| | | |
|ms-be[1042,1049-1050,1054,1062,1066]| | | |
|ms-fe1011| | | |
|pc1013| | | Nothing to be done as eqiad will be depooled|
|thanos-be1003| | | |
== ServiceOps ==
#serviceops
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|deploy1002| | | |
|kafka-main1003| | | |
|kubemaster1002| | | |
|kubernetes[1006,1011-1012,1020,1023]| | | |
|kubestagetcd1006| | | |
|kubetcd1004| | | |
|mc[1045-1050]| | | |
|mc-gp1002| | | |
|mw[1405-1413,1434-1436,1482-1486]| | | |
|mwdebug1001| | | |
|parse[1013-1016]| | | |
|poolcounter1005| | | |
|registry1004| | | |
== WMCS ==
#cloud-services-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|cloudcontrol1005| | | |
|clouddb[1017-1018]| | | |
|clouddumps1002| | | |
|cloudmetrics1004| | | |
|cloudrabbit1002| | | |
|dbproxy1018| | | |
|labstore[1004-1005]| | | |