= codfw row A switches upgrade =
For reasons detailed in {T327248} we're going to upgrade codfw row A switches.
This is scheduled for **Feb 7th - 14:00-16:00 UTC**, please let us know if there is any issue with the scheduled time.
It means a !!30min hard downtime!! for the whole row if everything goes well. Also a good opportunity to test the hosts depool mechanisms and row redundancy of services.
The list of impacted servers and teams for this row is listed below.
The actions needed is quite free form:
* please write `NONE` if no action is needed,
* the cookbook/command to run if it can be done by a 3rd party
* who will be around to take care of the depool
* Link to the relevant doc
* etc
The two main types of actions needed are depool and monitoring downtime
NOTE: If the servers can handle a longer depool, it's preferred to depool them many hours or the day before (and mark `None` in the table) so there are less moving parts closer to the maintenance window.
== Data Engineering ==
#data-engineering
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|aqs[2001-2004]| None | None | |
== Observability ==
#sre_observability
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|grafana2001| set downtime | none | |
|kafka-logging2001| set downtime, stop kafka service | start kafka service, confirm kafka logging dashboard returns green | |
|kafkamon2002| set downtime | none | |
|logstash[2001,2023,2026,2033]| | | |
|xhgui2001| | | |
== Observability and Data Persistence ==
#sre_observability #data-persistence
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|thanos-fe2001| | | |
== Search Platform ==
#discovery-search
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|elastic[2037-2040,2055-2056,2061-2062,2069,2073-2076]|None|None|Search team will depool & ban hosts from cluster one day prior to upgrade |
|wdqs[2003-2004,2009]|None|None|Search team will depool one day prior to upgrade |
== Core Platform ==
#core-platform-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|maps2005| | | |
|thumbor2005| | | |
== WMCS ==
#cloud-services-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|cloudbackup2001| | | |
== ServiceOps-Collab ==
#serviceops-collab
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|contint2001| | | |
|doc2001| NONE | NONE | |
|gitlab2002| | | |
|planet2002| NONE | NONE | |
== Data Persistence ==
#data-persistence
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|backup[2002,2004]| They are not a service, but storage. Jaime will make sure earlier in the week they are not active at the time of the maintenance. | Jaime will restart some delayed backups, if any. | |
|db[2094,2097,2103-2106,2121-2122,2132-2133,2136,2142,2145-2146,2153-2158,2175-2176,2183]|All MW need to be depooled, and some masters need to be switched over (misc masters do not need switchover/depooling | | @Marostegui No longer masters: db2103, db2104, db2105, db2121, db2142 - the rest of masters are misc so they can be ignored|
|dbprov2001| They are not a service, but storage. Jaime will make sure earlier in the week they are not active at the time of the maintenance. | None | |
|dbproxy2001|None |None | |
|es[2020,2024,2026-2028]|All need to be depooled | | |
|moss-be2001| | | |
|ms-be[2040,2044-2045,2051-2052,2060,2062,2066]| | | |
|ms-fe2009| | | |
|pc2011|To be depooled |To be repooled once it is all done |Already depooled by @Marostegui |
|thanos-be2001| | | |
== Infrastructure Foundations ==
#infrastructure-foundations
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|ganeti[2023-2024,2027-2030]| | | |
|ganeti-test[2001-2003]| | | |
|netbox2002| | | |
|netboxdb2002| | | |
|pki2001| | | |
|puppetdb2002| | | |
|puppetmaster[2001,2004]| | | |
|rpki2002| | | |
|test-reimage2001| | | |
|testvm[2001-2005]| | | |
|urldownloader2001| | | |
== Infrastructure Foundations and Observability ==
#infrastructure-foundations #sre_observability
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|netmon2002| | | |
== Machine Learning ==
#machine-learning-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|ml-cache2001| | | |
|ml-serve[2001,2005]| | | |
|ml-staging2001| | | |
|ml-staging-etcd2001| | | |
|ores[2001-2002]| | | |
|orespoolcounter2003| | | |
== Traffic ==
#traffic
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|acmechief2001|N/A | | |
|acmechief-test2001|N/A | | |
|authdns2001| [[ https://wikitech.wikimedia.org/wiki/Service_restarts#Authoritative_DNS | redirect to authdns1001 ]] | the opposite | |
|cp[2027-2030]|`sudo -i depool`|`sudo -i pool` | |
|doh2001| | | |
|lvs2007| disable puppet & stop pybal| the opposite | |
|ncredir2001|`sudo -i depool`|`sudo -i pool` | |
== ServiceOps ==
#serviceops
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|kafka-main2001| | | |
|kubemaster2001| | | |
|kubernetes[2005,2007-2008,2018-2019]| | | |
|kubestage2001| | | |
|kubetcd2004| | | |
|mc[2038-2041,2055]| | | |
|mc-gp2001| | | |
|mw[2291-2309,2377-2411]| | | |
|mwdebug2001| | | |
|parse[2001-2005]| | | |
|poolcounter2003| | | |
|rdb2007| | | |
|registry2003| | | |