Page MenuHomePhabricator

codfw: rack A5 maintenance
Open, MediumPublic

Description

See parent and grand-parent tasks T426197: codfw: pod AB switches upgrade (2026)

This task is to schedule the software upgrade of rack A4 top of rack switch scheduled for Tuesday 2026-06-16 12:00 UTC with an expected network connectivity loss of ~20min

https://wikitech.wikimedia.org/wiki/Network_leaf_maintenance

No information available about depool
backup2013: Couldn't get or parse depool Hiera key @jcrespo
puppetserver2002: Couldn't get or parse depool Hiera key @MoritzMuehlenhoff
rdb2007: Couldn't get or parse depool Hiera key @jijiki server is EOL and not in prod
thanos-be2006: Couldn't get or parse depool Hiera key @tappof

Depool needed
db2153: depool using cookbook sre.mysql.depool -r "rack depool" {name}
db2154: depool using cookbook sre.mysql.depool -r "rack depool" {name}
db2157: depool using cookbook sre.mysql.depool -r "rack depool" {name}
db2175: depool using cookbook sre.mysql.depool -r "rack depool" {name}
db2176: depool using cookbook sre.mysql.depool -r "rack depool" {name}
es2050: depool using cookbook sre.mysql.depool -r "rack depool" {name}
ml-serve2001: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2012: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2013: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2014: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2017: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2018: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2041: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2044: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2051: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2074: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2075: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2076: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2077: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2078: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2091: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2092: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2242: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2243: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2254: depool using cookbook sre.k8s.pool-depool-node
wikikube-worker2255: depool using cookbook sre.k8s.pool-depool-node

Per team grouping
Data Persistence: backup2013, db2153, db2154, db2157, db2175, db2176, es2050, thanos-be2006 @FCeratto-WMF
Machine Learning: ml-serve2001 @klausman
Infrastructure Foundations: puppetserver2002 @MoritzMuehlenhoff
ServiceOps: rdb2007, wikikube-worker2012, wikikube-worker2013, wikikube-worker2014, wikikube-worker2017, wikikube-worker2018, wikikube-worker2041, wikikube-worker2044, wikikube-worker2051, wikikube-worker2074, wikikube-worker2075, wikikube-worker2076, wikikube-worker2077, wikikube-worker2078, wikikube-worker2091, wikikube-worker2092, wikikube-worker2242, wikikube-worker2243, wikikube-worker2254, wikikube-worker2255 ServiceOps new

Details

Event Timeline

ayounsi triaged this task as Medium priority.

Re: backup2013, it needs no special treatment other than downtime, it has no issue with a temporary network maintenance unless it gets extended for a few days, as backups usually run during UTC night.

Change #1300766 had a related patch set uploaded (by Muehlenhoff; author: Muehlenhoff):

[operations/dns@master] Depool puppetserver2002 for rack maintenance

https://gerrit.wikimedia.org/r/1300766

@ayounsi For puppetserver2002 will need to be merged before the maintenance starts: https://gerrit.wikimedia.org/r/c/operations/dns/+/1300766 I'll take care of that.

And we should announce that people avoid Puppet merges during the switch maintenance, otherwise puppet-merge will run into a timeout while puppetserver2002 is down.

Re: backup2013, it needs no special treatment other than downtime, it has no issue with a temporary network maintenance unless it gets extended for a few days, as backups usually run during UTC night.

Sounds like it won't clash time-wise thanks for checking.

@ayounsi For puppetserver2002 will need to be merged before the maintenance starts: https://gerrit.wikimedia.org/r/c/operations/dns/+/1300766 I'll take care of that.

And we should announce that people avoid Puppet merges during the switch maintenance, otherwise puppet-merge will run into a timeout while puppetserver2002 is down.

Noted I'll touch base on Tuesday and we'll send mails/let people know on irc.