= codfw row A switches upgrade =
For reasons detailed in {T327248} we're going to upgrade codfw row A switches.
This is scheduled for **Feb 7th - 14:00-16:00 UTC**, please let us know if there is any issue with the scheduled time.
It means a !!30min hard downtime!! for the whole row if everything goes well. Also a good opportunity to test the hosts depool mechanisms and row redundancy of services.
The list of impacted servers and teams for this row is listed below.
The actions needed is quite free form:
* please write `NONE` if no action is needed,
* the cookbook/command to run if it can be done by a 3rd party
* who will be around to take care of the depool
* Link to the relevant doc
* etc
The two main types of actions needed are depool and monitoring downtime
NOTE: If the servers can handle a longer depool, it's preferred to depool them many hours or the day before (and mark `None` in the table) so there are less moving parts closer to the maintenance window.
== Data Engineering ==
#data-engineering
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|aqs[2001-2004]| None | None | |
== Observability ==
#sre_observability
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|grafana2001| set downtime | none | depooled |
|kafka-logging2001| set downtime, stop kafka service | start kafka service, confirm kafka logging dashboard returns green | depooled |
|kafkamon2002| set downtime | none | depooled |
|logstash[2001,2023,2026,2033]| conftool 2023, drain shards 2001,2026,2033 | conftool 2023, allocate shards 2001,2026,2033 | downtime scheduled, 2023 depooled, 2001,2026,2033 draining |
|xhgui2001| none | none | n/a |
== Observability and Data Persistence ==
#sre_observability #data-persistence
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|thanos-fe2001| conftool depool, while making sure another thanos-fe host is pooled for service `thanos-web` | conftool pool. make sure only one thanos-fe host is pooled for `thanos-web` service | |
== Search Platform ==
#discovery-search
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|elastic[2037-2040,2055-2056,2061-2062,2069,2073-2076]|None|None|Search team will depool & ban hosts from cluster one day prior to upgrade |
|wdqs[2003-2004,2009]|None|None|Search team will depool one day prior to upgrade |
== Core Platform ==
#core-platform-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|maps2005| | | |
|thumbor2005| | | |
== WMCS ==
#cloud-services-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|cloudbackup2001| NONE | NONE | |
== ServiceOps-Collab ==
#serviceops-collab
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|contint2001| NONE | NONE | |
|doc2001| NONE | NONE | |
|gitlab2002| NONE | NONE | |
|planet2002| NONE | NONE | |
== Data Persistence ==
#data-persistence
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|backup[2002,2004]| They are not a service, but storage. Jaime will make sure earlier in the week they are not active at the time of the maintenance. | Jaime will restart some delayed backups, if any. | |
|db[2094,2097,2103-2106,2121-2122,2132-2133,2136,2142,2145-2146,2153-2158,2175-2176,2183]|All MW need to be depooled, and some masters need to be switched over (misc masters do not need switchover/depooling |@Marostegui will repool everything | @Marostegui No longer masters: db2103, db2104, db2105, db2121, db2142 - the rest of masters are misc so they can be ignored - what needs to be depooled, is already depooled|
|dbprov2001| They are not a service, but storage. Jaime will make sure earlier in the week they are not active at the time of the maintenance. | None | |
|dbproxy2001|None |Reload haproxy | |
|es[2020,2024,2026-2028]|All need to be depooled (@Marostegui will do it) | @Marostegui will repool everything| Depooled |
|moss-be2001|N/A |N/A |Not currently in production service |
|ms-be[2040,2044-2045,2051-2052,2060,2062,2066]|None |None | |
|ms-fe2009|`sudo depool` |`sudo pool` | |
|pc2011|To be depooled |To be repooled once it is all done |Already depooled by @Marostegui |
|thanos-be2001|None |None | |
== Infrastructure Foundations ==
#infrastructure-foundations
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|ganeti[2023-2024,2027-2030]|None | | |
|ganeti-test[2001-2003]|None | | |
|netbox2002| None | None | |
|netboxdb2002| None | None | |
|pki2001|None |None |N/A |
|puppetdb2002|`sudo cumin 'A:codfw or A:esams or A:ulsfo' 'disable-puppet "Switch reboot: T327925"'` |`sudo cumin 'A:codfw or A:esams or A:ulsfo' 'enable-puppet "Switch reboot: T327925"'` | jbond will handle |
|puppetmaster[2001,2004]|`sudo cumin 'A:codfw or A:esams or A:ulsfo' 'disable-puppet "Switch reboot: T327925"'` |`sudo cumin 'A:codfw or A:esams or A:ulsfo' 'enable-puppet "Switch reboot: T327925"'` | jbond will handle |
|rpki2002| None | | |
|test-reimage2001| nNone | |
|testvm[2001-2005]|None| | |
|urldownloader2001|None| | |
== Infrastructure Foundations and Observability ==
#infrastructure-foundations #sre_observability
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|netmon2002| None | None | |
== Machine Learning ==
#machine-learning-team
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|ml-cache2001|-|-| |
|ml-serve[2001,2005]|-|-| |
|ml-staging2001|-|-| |
|ml-staging-etcd2001|- |-| |
|ores[2001-2002]|`sudo -i depool` | `sudo -i pool` | |
|orespoolcounter2003|- |- |-|
== Traffic ==
#traffic
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|acmechief2001|N/A | | |
|acmechief-test2001|N/A | | |
|authdns2001| [[ https://wikitech.wikimedia.org/wiki/Service_restarts#Authoritative_DNS | redirect to authdns1001 ]] | the opposite | done|
|cp[2027-2030]|N/A|N/A| |
|doh2001|disable puppet & stop bird.service|the oppposite|done|
|lvs2007|N/A|N/A| |
|ncredir2001|N/A|N/A| |
== ServiceOps ==
#serviceops
|Servers|Depool action needed|Repool action needed|Status|
|---|---|---|---|
|kafka-main2001| | | |
|kubemaster2001| | | |
|kubernetes[2005,2007-2008,2018-2019]| | | |
|kubestage2001| | | |
|kubetcd2004| | | |
|mc[2038-2041,2055]| | | |
|mc-gp2001| | | |
|mw[2291-2309,2377-2411]| | | |
|mwdebug2001| | | |
|parse[2001-2005]| | | |
|poolcounter2003| | | |
|rdb2007| | | |
|registry2003| | | |