Page MenuHomePhabricator

bring 43 new mediawiki appserver in eqiad into production
Open, Needs TriagePublic

Description

45 (43) new MediaWiki appservers were procured in T271155.

43 are being racked in T273915.

Just like T278396 for codfw this is the task to:

  • make a plan how many become appserver, API server, jobrunner/videoscaler, canaries, or ... dedicated jobrunner (T279100)
  • add appropriate regexes to site.pp, matching the puppet roles above
  • add hosts to conftool-data to the right sections
  • initial puppet run, reboot, check monitoring, set weight, pool
  • setup canaries and recreate hieradata (removed in 702659)
  • old servers have to be decom'ed in parallel (-> T280203)

Rack A3

  • mw1414.eqiad.wmnet - appserver
  • mw1415.eqiad.wmnet - appserver
  • mw1416.eqiad.wmnet - appserver
  • mw1417.eqiad.wmnet - appserver
  • mw1418.eqiad.wmnet - appserver
  • mw1419.eqiad.wmnet - appserver
  • mw1420.eqiad.wmnet - appserver
  • mw1421.eqiad.wmnet - API server
  • mw1422.eqiad.wmnet - API server

Rack B3

  • mw1423.eqiad.wmnet - API server
  • mw1424.eqiad.wmnet - API server
  • mw1425.eqiad.wmnet - API server
  • mw1426.eqiad.wmnet - API server
  • mw1427.eqiad.wmnet - API server
  • mw1428.eqiad.wmnet - API server
  • mw1429.eqiad.wmnet - appserver
  • mw1430.eqiad.wmnet - appserver
  • mw1431.eqiad.wmnet - appserver
  • mw1432.eqiad.wmnet - appserver
  • mw1433.eqiad.wmnet - appserver

Rack C3

  • mw1434.eqiad.wmnet - API server
  • mw1435.eqiad.wmnet - API server
  • mw1436.eqiad.wmnet - API server

Rack D8

  • mw1437.eqiad.wmnet - jobrunner canary
  • mw1438.eqiad.wmnet - jobrunner canary
  • mw1439.eqiad.wmnet - appserver
  • mw1440.eqiad.wmnet - appserver
  • mw1441.eqiad.wmnet - appserver
  • mw1442.eqiad.wmnet - appserver
  • mw1443.eqiad.wmnet - API server
  • mw1444.eqiad.wmnet - API server (!) - NOT REACHABLE via SSH
  • mw1445.eqiad.wmnet - API server
  • mw1446.eqiad.wmnet - API server
  • mw1447.eqiad.wmnet
  • mw1448.eqiad.wmnet - not in DNS yet
  • mw1449.eqiad.wmnet - not in DNS yet
  • mw1450.eqiad.wmnet - not in DNS yet

Planned

  • mw1451.eqiad.wmnet
  • mw1452.eqiad.wmnet
  • mw1453.eqiad.wmnet
  • mw1454.eqiad.wmnet
  • mw1455.eqiad.wmnet
  • mw1456.eqiad.wmnet

Details

ProjectBranchLines +/-Subject
operations/puppetproduction+10 -1
operations/puppetproduction+7 -0
labs/privatemaster+0 -0
operations/puppetproduction+5 -1
operations/puppetproduction+9 -0
operations/puppetproduction+8 -3
operations/puppetproduction+7 -0
operations/puppetproduction+7 -0
operations/puppetproduction+14 -7
operations/puppetproduction+1 -1
operations/puppetproduction+9 -0
operations/puppetproduction+4 -1
operations/puppetproduction+1 -1
operations/puppetproduction+9 -1
operations/puppetproduction+3 -2
labs/privatemaster+0 -0
labs/privatemaster+0 -0
operations/puppetproduction+14 -1
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 702880 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] site: add eight appsevers in eqiad row A3

https://gerrit.wikimedia.org/r/702880

Mentioned in SAL (#wikimedia-operations) [2021-07-02T08:24:00Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T08:24:08Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T08:24:40Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T08:24:47Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Change 702880 merged by Jelto:

[operations/puppet@production] site: add eight appsevers in eqiad row A3

https://gerrit.wikimedia.org/r/702880

Change 702888 had a related patch set uploaded (by Jelto; author: Jelto):

[labs/private@master] add mcrouter certs for mw1414.eqiad.wmnet

https://gerrit.wikimedia.org/r/702888

Change 702888 merged by Jelto:

[labs/private@master] add mcrouter certs for mw1414.eqiad.wmnet

https://gerrit.wikimedia.org/r/702888

Mentioned in SAL (#wikimedia-operations) [2021-07-02T11:14:57Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T11:15:06Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T11:15:13Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T11:15:20Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Change 702923 had a related patch set uploaded (by Jelto; author: Jelto):

[labs/private@master] add mcrouter certs for mw1415.eqiad.wmnet to mw1421.eqiad.wmnet

https://gerrit.wikimedia.org/r/702923

Change 702923 merged by Jelto:

[labs/private@master] add mcrouter certs for mw1415.eqiad.wmnet to mw1421.eqiad.wmnet

https://gerrit.wikimedia.org/r/702923

Mentioned in SAL (#wikimedia-operations) [2021-07-02T13:22:04Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T13:22:11Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on 6 hosts with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T13:22:18Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-02T13:22:24Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on mw[1420-1421].eqiad.wmnet with reason: setup new appservers in eqiad A3 https://phabricator.wikimedia.org/T279309

Change 704103 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] role::common::mediawiki::canary_appserver add new canary app server in eqiad

https://gerrit.wikimedia.org/r/704103

Dzahn changed the task status from Stalled to Open.Tue, Jul 13, 11:52 AM
Dzahn updated the task description. (Show Details)

Change 704319 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: turn mw1422 into an mw appserver

https://gerrit.wikimedia.org/r/704319

Change 704319 merged by Dzahn:

[operations/puppet@production] site/conftool: turn mw1422 into an mw appserver

https://gerrit.wikimedia.org/r/704319

Change 704556 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: turn mw1423,mw1424,mw1425 into API appservers

https://gerrit.wikimedia.org/r/704556

Mentioned in SAL (#wikimedia-operations) [2021-07-15T08:11:12Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-15T08:11:19Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on mw[1414-1418].eqiad.wmnet with reason: change new eqiad appservers to canary https://phabricator.wikimedia.org/T279309

Change 704103 merged by Jelto:

[operations/puppet@production] role::common::mediawiki::canary_appserver add new canary app server in eqiad

https://gerrit.wikimedia.org/r/704103

Change 704556 merged by Dzahn:

[operations/puppet@production] site/conftool: turn mw1423,mw1424,mw1425 into API appservers

https://gerrit.wikimedia.org/r/704556

Change 704778 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: re-add mw1422 to regex for appservers

https://gerrit.wikimedia.org/r/704778

Change 704778 merged by Dzahn:

[operations/puppet@production] site: re-add mw1422 to regex for appservers

https://gerrit.wikimedia.org/r/704778

Dzahn updated the task description. (Show Details)

Change 704945 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1426, mw1427, mw1428 as API appservers

https://gerrit.wikimedia.org/r/704945

Change 704945 merged by Dzahn:

[operations/puppet@production] site/conftool: add mw1426, mw1427, mw1428 as API appservers

https://gerrit.wikimedia.org/r/704945

Change 704950 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1429 through mw1433 as appservers, rack B3

https://gerrit.wikimedia.org/r/704950

Change 704950 merged by Dzahn:

[operations/puppet@production] site/conftool: add mw1429 through mw1433 as appservers, rack B3

https://gerrit.wikimedia.org/r/704950

Change 704954 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site: add mw1430 to regex for new appserver range

https://gerrit.wikimedia.org/r/704954

Change 704954 merged by Dzahn:

[operations/puppet@production] site: add mw1430 to regex for new appserver range

https://gerrit.wikimedia.org/r/704954

Mentioned in SAL (#wikimedia-operations) [2021-07-16T12:39:42Z] <mutante> mw1412 through mw1428 - set to active in netbox (T279309)

Mentioned in SAL (#wikimedia-operations) [2021-07-16T12:49:35Z] <mutante> mw1429 through mw1433 - initial puppet run, reboot, moving into production as appservers (T279309)

Change 705385 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1434,mw1435,mw1436 as API appservers

https://gerrit.wikimedia.org/r/705385

Change 705385 merged by Dzahn:

[operations/puppet@production] site/conftool: add mw1434,mw1435,mw1436 as API appservers

https://gerrit.wikimedia.org/r/705385

Change 705721 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1437,mw1438 as canary jobrunners

https://gerrit.wikimedia.org/r/705721

Jelto updated the task description. (Show Details)

Change 705721 merged by Dzahn:

[operations/puppet@production] site/conftool: add mw1437,mw1438 as canary jobrunners

https://gerrit.wikimedia.org/r/705721

Dzahn updated the task description. (Show Details)

Change 705927 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1439, mw1440 as jobrunners

https://gerrit.wikimedia.org/r/705927

FYI if that helps this is the current row-distribution of the API appservers in eqiad:

{'B': 19, 'D': 18, 'C': 17, 'A': 9}

Full details at P16841

Change 705943 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] conftool: convert mw1421, mw1422 from app to API servers for balance

https://gerrit.wikimedia.org/r/705943

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

mw1421.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202107211607_dzahn_31447_mw1421_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['mw1421.eqiad.wmnet']

Of which those FAILED:

['mw1421.eqiad.wmnet']

FYI I've updated the pastes for eqiad and codfw with some more detailed data, all yours now :)

Change 705943 merged by Dzahn:

[operations/puppet@production] conftool: convert mw1421, mw1422 from app to API servers for balance

https://gerrit.wikimedia.org/r/705943

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

mw1421.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202107221018_dzahn_8762_mw1421_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by dzahn on cumin1001.eqiad.wmnet for hosts:

mw1422.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202107221024_dzahn_13185_mw1422_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['mw1421.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw1422.eqiad.wmnet']

and were ALL successful.

Change 706485 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] site/conftool: add mw1439,mw1440,mw1441,mw1442 as canary API appservers

https://gerrit.wikimedia.org/r/706485

Change 707252 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1437 through mw1440 as appservers, rack D8

https://gerrit.wikimedia.org/r/707252

Change 707252 merged by Dzahn:

[operations/puppet@production] site/conftool: add mw1439 through mw1442 as appservers, rack D8

https://gerrit.wikimedia.org/r/707252

Change 707298 had a related patch set uploaded (by Dzahn; author: Dzahn):

[operations/puppet@production] site/conftool: add mw1443 through mw1446 as API appservers

https://gerrit.wikimedia.org/r/707298

Dzahn updated the task description. (Show Details)

Change 707300 had a related patch set uploaded (by Jelto; author: Jelto):

[labs/private@master] add mcrouter certs for mw1422.eqiad.wmnet to mw1442.eqiad.wmnet

https://gerrit.wikimedia.org/r/707300

Change 707298 merged by Dzahn:

[operations/puppet@production] site/conftool: add mw1443 through mw1446 as API appservers

https://gerrit.wikimedia.org/r/707298

Change 707300 merged by Dzahn:

[labs/private@master] add mcrouter certs for mw1422.eqiad.wmnet to mw1446.eqiad.wmnet

https://gerrit.wikimedia.org/r/707300

Change 705927 abandoned by Dzahn:

[operations/puppet@production] site/conftool: add mw1439, mw1440 as jobrunners

Reason:

already used as appservers

https://gerrit.wikimedia.org/r/705927

Mentioned in SAL (#wikimedia-operations) [2021-07-23T12:15:39Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-23T12:15:47Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw1439.eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-23T12:16:12Z] <jelto@cumin1001> START - Cookbook sre.hosts.downtime for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309

Mentioned in SAL (#wikimedia-operations) [2021-07-23T12:16:19Z] <jelto@cumin1001> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 3:00:00 on mw[1440-1442].eqiad.wmnet with reason: setup new canary mw api servers in eqiad D8 https://phabricator.wikimedia.org/T279309