Page MenuHomePhabricator

upgrade miscweb VMs to bullseye
Closed, ResolvedPublic

Description

replace miscweb1002 and miscweb2002 (buster) with new VMs on bullseye and switch microsites to them

part of SRE sprint week - sub group bullseye upgrades

this should also include full decom of old VMs and removal from repo of "webserver-misc-static"


update 20230404: all services have been switched to new set of VMs on bullseye with a single exception, https://iegreview.wikimedia.org

one of the 2 remaining buster machines, the inactive one in codfw, has already been decom'ed fully.

this is only open because miscweb1002 is left in eqiad and hosts iegreview and that is waiting for T332918

  • new VMs on bullseye in service
  • (almost) all services moved to new VMs
  • old VM in codfw decom'ed
  • old VM in eqiad decom'ed (not yet, still hosting iegreview)

Details

Related Changes in Gerrit:
SubjectRepoBranchLines +/-
operations/puppetproduction+2 -3
operations/dnsmaster+0 -1
operations/puppetproduction+1 -1
operations/puppetproduction+4 -4
operations/puppetproduction+2 -2
operations/puppetproduction+0 -1
operations/dnsmaster+0 -2
operations/puppetproduction+2 -0
operations/puppetproduction+1 -1
operations/puppetproduction+5 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+8 -0
operations/puppetproduction+1 -1
operations/puppetproduction+5 -0
operations/puppetproduction+1 -1
operations/puppetproduction+2 -2
operations/puppetproduction+2 -2
operations/puppetproduction+2 -0
operations/puppetproduction+4 -0
operations/puppetproduction+2 -2
operations/puppetproduction+3 -3
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+1 -1
operations/dnsmaster+2 -0
operations/puppetproduction+1 -5
operations/puppetproduction+4 -0
operations/puppetproduction+7 -1
operations/puppetproduction+0 -1
operations/puppetproduction+5 -0
operations/puppetproduction+2 -2
operations/puppetproduction+1 -1
operations/puppetproduction+4 -0
Show related patches Customize query in gerrit

Related Objects

StatusSubtypeAssignedTask
OpenNone
ResolvedDzahn
ResolvedArnoldokoth
ResolvedArnoldokoth
ResolvedArnoldokoth
Resolved eoghan
Resolvedandrea.denisse
Resolvedandrea.denisse
Resolvedhashar
ResolvedDzahn
ResolvedDzahn
Resolved eoghan
Resolved eoghan
Resolved eoghan
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedNone
Resolvedhashar
ResolvedDzahn
Resolvedhashar
DeclinedNone
Resolvedhashar
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedDzahn
ResolvedJclark-ctr
ResolvedBUG REPORThashar
ResolvedJelto
ResolvedDzahn
ResolvedLegoktm
ResolvedDzahn
ResolvedMoritzMuehlenhoff
InvalidNone
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
Resolved eoghan
ResolvedLadsgroup
DuplicateNone
ResolvedDzahn
ResolvedDzahn
ResolvedRequestDzahn

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 901296 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] add webserver-misc-sites and point it to miscweb1003/2003

https://gerrit.wikimedia.org/r/901296

Change 901296 merged by Dzahn:

[operations/dns@master] add webserver-misc-sites and point it to miscweb1003/2003

https://gerrit.wikimedia.org/r/901296

Change 901300 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch 15.wikipedia.org to miscweb2003

https://gerrit.wikimedia.org/r/901300

Change 901300 merged by Dzahn:

[operations/puppet@production] miscweb: switch 15.wikipedia.org to miscweb2003

https://gerrit.wikimedia.org/r/901300

Change 901318 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch annual and bienvenida microsites to miscweb2003

https://gerrit.wikimedia.org/r/901318

Change 901319 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch tendril and dbtree microsites to miscweb2003

https://gerrit.wikimedia.org/r/901319

Change 901320 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch security.wm.org microsite to miscweb2003

https://gerrit.wikimedia.org/r/901320

Change 901321 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch sitemaps, transparency and tr-archives to miscweb2003

https://gerrit.wikimedia.org/r/901321

Change 901318 merged by Dzahn:

[operations/puppet@production] miscweb: switch annual and bienvenida microsites to miscweb2003

https://gerrit.wikimedia.org/r/901318

Change 901319 merged by Dzahn:

[operations/puppet@production] miscweb: switch tendril and dbtree microsites to miscweb2003

https://gerrit.wikimedia.org/r/901319

Change 900465 merged by Dzahn:

[operations/puppet@production] miscweb: add miscweb1003/2003 to rsync_dst_hosts

https://gerrit.wikimedia.org/r/900465

Change 901677 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb/iegreview: set custom log, don't log into "other_vhosts" file

https://gerrit.wikimedia.org/r/901677

Change 901677 merged by Dzahn:

[operations/puppet@production] miscweb/iegreview: set custom log, don't log into "other_vhosts" file

https://gerrit.wikimedia.org/r/901677

Change 901678 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb/annualreport: use non-generic custom log file name

https://gerrit.wikimedia.org/r/901678

Change 901678 merged by Dzahn:

[operations/puppet@production] miscweb/annualreport: use non-generic custom log file name

https://gerrit.wikimedia.org/r/901678

Change 901320 merged by Dzahn:

[operations/puppet@production] miscweb: switch security.wm.org microsite to miscweb2003

https://gerrit.wikimedia.org/r/901320

Dzahn changed the task status from Open to In Progress.Mar 22 2023, 6:11 PM
Dzahn triaged this task as High priority.
Dzahn updated the task description. (Show Details)

Mentioned in SAL (#wikimedia-operations) [2023-03-22T18:12:38Z] <mutante> rsyncing /srv/org/wikimedia/sitemaps files for https://sitemaps.wikimedia.org from old to new machines. most other things are auto-deployed by puppet or puppet running intial scap or automatic rsync.. this is not. rsync -av /srv/org/wikimedia/sitemaps/ rsync://miscweb2003.codfw.wmnet/miscapps-srv/org/wikimedia/sitemaps/ T331896 - but also see T332101

Change 901321 merged by Dzahn:

[operations/puppet@production] miscweb: switch sitemaps, transparency and tr-archives to miscweb2003

https://gerrit.wikimedia.org/r/901321

Change 902167 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch research.wikimedia.org to bullseye backend

https://gerrit.wikimedia.org/r/902167

Change 902169 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch wikiworkshop.org to bullseye backend

https://gerrit.wikimedia.org/r/902169

Change 902170 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch design.wikimedia.org to bullseye backend

https://gerrit.wikimedia.org/r/902170

Change 902172 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch os-reports.wikimedia.org to bullseye backend

https://gerrit.wikimedia.org/r/902172

Change 902174 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: switch static-codereview to bullseye backend

https://gerrit.wikimedia.org/r/902174

Change 902140 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: move transparency httpd site templates out of role/apache

https://gerrit.wikimedia.org/r/902140

Change 902140 merged by Dzahn:

[operations/puppet@production] miscweb: move transparency httpd site templates out of role/apache

https://gerrit.wikimedia.org/r/902140

Change 902142 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: move os_reports httpd template to profile/microsites/

https://gerrit.wikimedia.org/r/902142

Change 902142 merged by Dzahn:

[operations/puppet@production] miscweb: move os_reports httpd template to profile/microsites/

https://gerrit.wikimedia.org/r/902142

Change 902144 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: add custom and error log for os-reports.wikimedia.org

https://gerrit.wikimedia.org/r/902144

Change 902144 merged by Dzahn:

[operations/puppet@production] miscweb: add custom and error log for os-reports.wikimedia.org

https://gerrit.wikimedia.org/r/902144

Change 902172 merged by Dzahn:

[operations/puppet@production] miscweb: switch os-reports.wikimedia.org to bullseye backend

https://gerrit.wikimedia.org/r/902172

Change 902166 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: add custom and error log for transparency and archives

https://gerrit.wikimedia.org/r/902166

Change 902166 merged by Dzahn:

[operations/puppet@production] miscweb: add custom and error log for transparency and archives

https://gerrit.wikimedia.org/r/902166

Change 902167 merged by Dzahn:

[operations/puppet@production] miscweb: switch research.wikimedia.org to bullseye backend

https://gerrit.wikimedia.org/r/902167

Change 902169 merged by Dzahn:

[operations/puppet@production] miscweb: switch wikiworkshop.org to bullseye backend

https://gerrit.wikimedia.org/r/902169

Change 902170 merged by Dzahn:

[operations/puppet@production] miscweb: switch design.wikimedia.org to bullseye backend

https://gerrit.wikimedia.org/r/902170

Mentioned in SAL (#wikimedia-operations) [2023-03-23T02:00:00Z] <mutante> rsyncing ~4GB files for static-codereview.wikimedia.org from old to newer VMs for T331896 - no automatic sync / deploy for these

Change 902228 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb/static-codereview/httpbb: also test if files were synced

https://gerrit.wikimedia.org/r/902228

Change 902228 merged by Dzahn:

[operations/puppet@production] miscweb/static-codereview/httpbb: also test if files were synced

https://gerrit.wikimedia.org/r/902228

Change 902174 merged by Dzahn:

[operations/puppet@production] miscweb: switch static-codereview to bullseye backend

https://gerrit.wikimedia.org/r/902174

Change 902229 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] decom miscweb2002

https://gerrit.wikimedia.org/r/902229

miscweb2003 bacula backups are failing. As I understand, those hosts are in setup, so this is expected (please warn if not!) so I will remove them from the check so they stop alerting- they should be added back to it before closing this task to make sure newer backups are checked.

Change 903179 had a related patch set uploaded (by Jcrespo; author: Jcrespo):

[operations/puppet@production] bacula: Add miscweb2003 jobs to the list of monitoring-ignored jobs

https://gerrit.wikimedia.org/r/903179

Change 903179 merged by Jcrespo:

[operations/puppet@production] bacula: Add miscweb2003 jobs to the list of monitoring-ignored jobs

https://gerrit.wikimedia.org/r/903179

✔️ root@backup1001:~$ check_bacula.py miscweb2003.codfw.wmnet-Monthly-1st-Wed-productionEqiad-rt-static
id: 502485, ts: None, type: I, status: C, bytes: 0
id: 501517, ts: 2023-03-21 05:53:55, type: F, status: f, bytes: 0
id: 501660, ts: 2023-03-22 05:25:05, type: F, status: f, bytes: 0
id: 501796, ts: 2023-03-23 05:11:14, type: F, status: f, bytes: 0
id: 501932, ts: 2023-03-24 05:28:19, type: F, status: f, bytes: 0
id: 502069, ts: 2023-03-25 05:23:13, type: F, status: f, bytes: 0
id: 502207, ts: 2023-03-26 04:56:19, type: F, status: f, bytes: 0
id: 502346, ts: 2023-03-27 04:57:39, type: F, status: f, bytes: 0
✔️ root@backup1001:~$ check_bacula.py miscweb2003.codfw.wmnet-Monthly-1st-Wed-productionEqiad-static-codereview
id: 502486, ts: None, type: I, status: C, bytes: 0
id: 501518, ts: 2023-03-21 05:56:56, type: F, status: f, bytes: 0
id: 501661, ts: 2023-03-22 05:28:06, type: F, status: f, bytes: 0
id: 501797, ts: 2023-03-23 05:14:14, type: F, status: f, bytes: 0
id: 501933, ts: 2023-03-24 05:31:29, type: F, status: f, bytes: 0
id: 502070, ts: 2023-03-25 05:26:23, type: F, status: f, bytes: 0
id: 502208, ts: 2023-03-26 04:59:19, type: F, status: f, bytes: 0
id: 502347, ts: 2023-03-27 05:00:39, type: F, status: f, bytes: 0

The backup of this host is returning f as backup status ("fatal error"). I ignored what was going on because I thought just paths were missing or other issue common of WIP host. But I now assume this is not expected, so digging deeper.

Log says:

27-Mar 05:00 backup1001.eqiad.wmnet JobId 502347: Warning: bsockcore.c:201 Could not connect to Client: miscweb2003.codfw.wmne
t-fd on miscweb2003.codfw.wmnet:9102. ERR=Connection refused
Retrying ...
27-Mar 05:03 backup1001.eqiad.wmnet JobId 502347: Fatal error: bsockcore.c:208 Unable to connect to Client: miscweb2003.codfw.
wmnet-fd on miscweb2003.codfw.wmnet:9102. ERR=Connection refused
27-Mar 05:03 backup1001.eqiad.wmnet JobId 502347: Fatal error: No Job status returned from FD.

While, what looks like a network error can be several things, my guess is we are missing the profile setting up the client and opening the port of the new host- as it is only happening for this server.

@Dzahn I will be around when you start working so ping me so we can have a look together before we revert the above patch.

@jcrespo The bacula-fd service was running and listening on port 9102 but still refusing connections. Restarting the service fixed it though and now a connection from backup1001 to port 9102 on miscweb2003 works.

before:

root@miscweb2003:/home/dzahn# netstat -tulpen | grep bacula
tcp        0      0 127.0.0.1:9102          0.0.0.0:*               LISTEN      0          2781812    559184/bacula-fd    

..
[backup1001:~] $ telnet miscweb2003.codfw.wmnet 9102
Trying 2620:0:860:104:10:192:48:65...
Trying 10.192.48.65...
telnet: Unable to connect to remote host: Connection refused

action taken:

systemctl restart bacula-fd

after:

root@miscweb2003:/home/dzahn# netstat -tulpen | grep bacula
tcp        0      0 0.0.0.0:9102            0.0.0.0:*               LISTEN      0          11164886   1857341/bacula-fd   

..

[backup1001:~] $ telnet miscweb2003.codfw.wmnet 9102
Trying 2620:0:860:104:10:192:48:65...
Trying 10.192.48.65...
Connected to miscweb2003.codfw.wmnet.
Escape character is '^]'.

So should be fine now.

The bacula-fd service was running and listening on port 9102 but still refusing connections

Oh, weird.

I will try to schedule a rerun of the backups to confirm it is now fixed- and we can deploy the revert. Sorry this happened.

@jcrespo Oh, I noticed now before it was listening only on 127.0.0.1 but after it is listening on 0.0.0.0. Must be a race condition. Seems rare though. Thanks for confirming :)

Change 902175 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] delete webserver-misc-static.discovery.wmnet

https://gerrit.wikimedia.org/r/902175

Change 902175 merged by Dzahn:

[operations/dns@master] delete webserver-misc-static.discovery.wmnet

https://gerrit.wikimedia.org/r/902175

root@backup1001:~$ check_bacula.py miscweb2003.codfw.wmnet-Monthly-1st-Wed-productionEqiad-rt-static
id: 502633, ts: None, type: I, status: C, bytes: 0
id: 501517, ts: 2023-03-21 05:53:55, type: F, status: f, bytes: 0
id: 501660, ts: 2023-03-22 05:25:05, type: F, status: f, bytes: 0
id: 501796, ts: 2023-03-23 05:11:14, type: F, status: f, bytes: 0
id: 501932, ts: 2023-03-24 05:28:19, type: F, status: f, bytes: 0
id: 502069, ts: 2023-03-25 05:23:13, type: F, status: f, bytes: 0
id: 502207, ts: 2023-03-26 04:56:19, type: F, status: f, bytes: 0
id: 502346, ts: 2023-03-27 04:57:39, type: F, status: f, bytes: 0
id: 502485, ts: 2023-03-28 22:03:31, type: F, status: T, bytes: 2032
id: 502523, ts: 2023-03-28 22:39:38, type: F, status: T, bytes: 2032
✔️ root@backup1001:~$ check_bacula.py miscweb2003.codfw.wmnet-Monthly-1st-Wed-productionEqiad-static-codereview
id: 502634, ts: None, type: I, status: C, bytes: 0
id: 501518, ts: 2023-03-21 05:56:56, type: F, status: f, bytes: 0
id: 501661, ts: 2023-03-22 05:28:06, type: F, status: f, bytes: 0
id: 501797, ts: 2023-03-23 05:14:14, type: F, status: f, bytes: 0
id: 501933, ts: 2023-03-24 05:31:29, type: F, status: f, bytes: 0
id: 502070, ts: 2023-03-25 05:26:23, type: F, status: f, bytes: 0
id: 502208, ts: 2023-03-26 04:59:19, type: F, status: f, bytes: 0
id: 502347, ts: 2023-03-27 05:00:39, type: F, status: f, bytes: 0
id: 502486, ts: 2023-03-28 22:03:34, type: F, status: T, bytes: 4524720912
id: 502524, ts: 2023-03-28 22:39:41, type: F, status: T, bytes: 4524720912
✔️

Backups are now flowing ok, reverting ignore list.

Backups are now flowing ok, reverting ignore list.

Great! Thanks for confirming that.

Change 904619 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] miscweb: remove miscweb2002 from rsync dest hosts

https://gerrit.wikimedia.org/r/904619

Change 904619 merged by Dzahn:

[operations/puppet@production] miscweb: remove miscweb2002 from rsync dest hosts

https://gerrit.wikimedia.org/r/904619

Change 905292 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] trafficserver/wdqs: switch query-preview.wikidata.org to new backend

https://gerrit.wikimedia.org/r/905292

Change 905317 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] wdqs/wcqs: switch query.wikidata.org and wcqs to bullseye backends

https://gerrit.wikimedia.org/r/905317

Change 905292 merged by Dzahn:

[operations/puppet@production] trafficserver/wdqs: switch query-preview.wikidata.org to new backend

https://gerrit.wikimedia.org/r/905292

Mentioned in SAL (#wikimedia-operations) [2023-04-03T21:52:59Z] <ryankemper> T331896 sudo -E cumin -b 4 'wdqs*' 'sudo run-puppet-agent'

Change 905317 merged by Dzahn:

[operations/puppet@production] wdqs/wcqs: switch query.wikidata.org and wcqs to bullseye backends

https://gerrit.wikimedia.org/r/905317

Mentioned in SAL (#wikimedia-operations) [2023-04-04T19:55:15Z] <mutante> https://query.wikidata.org and WCQS GUIs are switching to new backend VMs on bullseye in codfw T330090 T331896

Mentioned in SAL (#wikimedia-operations) [2023-04-04T20:00:15Z] <ryankemper> T331896 Running puppet on wdqs fleet to pickup new miscweb gui_url: ryankemper@cumin1001:~$ sudo -E cumin -b 6 'wdqs*' 'run-puppet-agent'

Mentioned in SAL (#wikimedia-operations) [2023-04-04T20:06:49Z] <ryankemper> T331896 Running puppet on wcqs fleet to pickup new miscweb gui_url: ryankemper@cumin1001:~$ sudo -E cumin -b 2 'wcqs*' 'run-puppet-agent'

@LSobanski Everything is migrated off of buster/eqiad miscweb - except iegreview. so now we can shutdown miscweb2002, inactive buster host (T334024), but to finish this and shutdown miscweb1002, active buster host with ONLY iegreview, we are still blocked on T332918 .

Change 902229 merged by Dzahn:

[operations/puppet@production] miscweb/site: remove miscweb2002 from site

https://gerrit.wikimedia.org/r/902229

Dzahn changed the task status from In Progress to Stalled.Apr 4 2023, 10:09 PM

now stalled on T332918 (this ticket is High prio but the one it's stalled on is Low prio)

Change 907546 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/dns@master] remove miscweb1002->webserver-misc-apps

https://gerrit.wikimedia.org/r/907546

Change 907547 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/miscweb: remove miscweb1002, switch rsync source to miscweb1003

https://gerrit.wikimedia.org/r/907547

Change 907546 merged by Dzahn:

[operations/dns@master] remove miscweb1002->webserver-misc-apps

https://gerrit.wikimedia.org/r/907546

Dzahn changed the task status from Stalled to In Progress.Apr 10 2023, 11:07 PM
Dzahn closed this task as Resolved.
Dzahn updated the task description. (Show Details)

done! remaining miscweb1002 on buster has also been fully decom'ed. This closes the ticket.

Change 907547 merged by Dzahn:

[operations/puppet@production] site/miscweb: remove miscweb1002, switch rsync source to miscweb1003

https://gerrit.wikimedia.org/r/907547