Page MenuHomePhabricator

(Need by: 2020-02-28) rack/setup/install mw[1385-1413].eqiad.wmnet
Closed, ResolvedPublic

Description

Please note these systems were ordered on T233639, and DC-Ops needs feedback from serviceops on the split between mw and kubenetes systems ordered on that task.

serviceops: Please update this task to denote how many of the 37 total systems are to be used for eqiad mw, and how much will be used for eqiad kubernetes systems. Please also fill out the racking proposal, detailing how you want these new systems split across the datacenter. (Are these replacing existing hosts or adding to the cluster?)

Racking Proposal:

RackA 4A 8B 5B 8C 3C 8
mw servers446654

Hostnames: mw[1385-1413].eqiad.wmnet

This checklist should be duplicated for EVERY SINGLE HOST:

  • - receive in system on procurement task T233639
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged
  • - handoff for service implementation
  • - service implementer changes from 'staged' status to 'active' status in netbox'

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Dzahn added a subscriber: Dzahn.Feb 28 2020, 12:10 AM

Change 575382 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site: add second batch of new eqiad appservers as spares, by rack

https://gerrit.wikimedia.org/r/575382

Change 575382 merged by Dzahn:
[operations/puppet@production] site: add second batch of new eqiad appservers as spares, by rack

https://gerrit.wikimedia.org/r/575382

Dzahn raised the priority of this task from Medium to High.Feb 28 2020, 9:58 PM

Raising priority because the Needed-by date has arrived . Could we have a status update @Cmjohnson ? Is there a lot left to do before OS installs can start?

Change 574785 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns for mw185-1413

https://gerrit.wikimedia.org/r/574785

Change 576073 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding production dns for new mw1385-1413

https://gerrit.wikimedia.org/r/576073

Change 576073 merged by Cmjohnson:
[operations/dns@master] Adding production dns for new mw1385-1413

https://gerrit.wikimedia.org/r/576073

Change 576127 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/puppet@production] Adding mac addresses for mw1385-1413

https://gerrit.wikimedia.org/r/576127

Change 576127 merged by Cmjohnson:
[operations/puppet@production] Adding mac addresses for mw1385-1413

https://gerrit.wikimedia.org/r/576127

Cmjohnson updated the task description. (Show Details)Mar 3 2020, 12:13 PM
Cmjohnson updated the task description. (Show Details)Mar 3 2020, 1:27 PM

Everything but the initial puppet run has been completed. Did the puppet certification process change? This fails now

cmjohnson@puppetmaster1001:~$ sudo /usr/local/sbin/install-console mw1385.eqiad.wmnet
sudo: /usr/local/sbin/install-console: command not found

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1385.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031644_cmjohnson_92643_mw1385_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1385.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031647_cmjohnson_93114_mw1385_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1386.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031649_cmjohnson_94522_mw1386_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1387.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031649_cmjohnson_94600_mw1387_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1388.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031659_cmjohnson_96242_mw1388_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1389.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031701_cmjohnson_96538_mw1389_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1390.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031703_cmjohnson_96872_mw1390_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1391.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031704_cmjohnson_96943_mw1391_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1392.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031704_cmjohnson_98442_mw1392_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1393.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031714_cmjohnson_99732_mw1393_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1394.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031715_cmjohnson_99887_mw1394_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1395.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031716_cmjohnson_99994_mw1395_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1396.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031719_cmjohnson_100511_mw1396_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1397.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031723_cmjohnson_103258_mw1397_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1399.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031727_cmjohnson_103845_mw1399_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1400.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031731_cmjohnson_106485_mw1400_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1401.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031732_cmjohnson_106583_mw1401_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1402.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031732_cmjohnson_106673_mw1402_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1403.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031735_cmjohnson_108311_mw1403_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1398.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031738_cmjohnson_108555_mw1398_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1404.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031742_cmjohnson_109084_mw1404_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1405.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031748_cmjohnson_110857_mw1405_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1406.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031749_cmjohnson_110920_mw1406_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1407.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031749_cmjohnson_110992_mw1407_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1403.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031753_cmjohnson_111647_mw1403_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1408.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031753_cmjohnson_111819_mw1408_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1409.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031757_cmjohnson_112381_mw1409_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1410.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031803_cmjohnson_113460_mw1410_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1411.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031804_cmjohnson_113548_mw1411_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1412.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031804_cmjohnson_113597_mw1412_eqiad_wmnet.log.

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1413.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003031804_cmjohnson_113663_mw1413_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['mw1408.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw1409.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw1410.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw1411.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw1412.eqiad.wmnet']

and were ALL successful.

Completed auto-reimage of hosts:

['mw1413.eqiad.wmnet']

and were ALL successful.

Cmjohnson updated the task description. (Show Details)Mar 4 2020, 4:22 PM

@Dzahn or whoever needs these, all of them with the exception of mw1403 is ready for service implementation. mw1403 is not installing and I am not sure why yet

Script wmf-auto-reimage was launched by cmjohnson on cumin1001.eqiad.wmnet for hosts:

mw1403.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202003041632_cmjohnson_102694_mw1403_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['mw1403.eqiad.wmnet']

and were ALL successful.

Cmjohnson reassigned this task from Cmjohnson to jijiki.Mar 4 2020, 4:59 PM
Cmjohnson removed a project: ops-eqiad.

@jijiki all servers are now ready for implementation. I am removing the ops-eqiad tag and assigned to you

Dzahn removed jijiki as the assignee of this task.Mar 4 2020, 5:02 PM

Thanks @Cmjohnson! I'll take that as jijiki is currently away.

Dzahn claimed this task.Mar 4 2020, 5:02 PM

Change 576966 had a related patch set uploaded (by RLazarus; owner: RLazarus):
[operations/puppet@production] site: Assign mw14{05,07,09,11,13} as appservers.

https://gerrit.wikimedia.org/r/576966

Change 576973 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] site/conftool: add new appservers in eqiad row B

https://gerrit.wikimedia.org/r/576973

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 7 host(s) and their services with reason: new_install

mw[1393-1399].eqiad.wmnet

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 5 host(s) and their services with reason: new_install

mw[1400-1404].eqiad.wmnet

Change 576973 merged by Dzahn:
[operations/puppet@production] site/conftool: add new appservers in eqiad row B

https://gerrit.wikimedia.org/r/576973

Change 576975 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] add fake certs for mw1393 through mw1404

https://gerrit.wikimedia.org/r/576975

Change 576975 merged by Dzahn:
[labs/private@master] add fake certs for mw1393 through mw1404

https://gerrit.wikimedia.org/r/576975

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 7 host(s) and their services with reason: new_install

mw[1393-1399].eqiad.wmnet

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 5 host(s) and their services with reason: new_install

mw[1400-1404].eqiad.wmnet

Change 577304 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/puppet@production] add mw1385 through mw1392 as api and appservers

https://gerrit.wikimedia.org/r/577304

Change 577308 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[labs/private@master] fake certificates for all remaining new eqiad and codfw appservers

https://gerrit.wikimedia.org/r/577308

Change 577308 merged by Dzahn:
[labs/private@master] fake certificates for all remaining new eqiad and codfw appservers

https://gerrit.wikimedia.org/r/577308

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install

mw[1385-1392].eqiad.wmnet

Change 577304 merged by Dzahn:
[operations/puppet@production] add mw1385 through mw1392 as api and appservers

https://gerrit.wikimedia.org/r/577304

Icinga downtime for 3:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install

mw[1385-1392].eqiad.wmnet

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install

mw[1385-1392].eqiad.wmnet

Icinga downtime for 1:00:00 set by rzl@cumin1001 on 9 host(s) and their services with reason: new install

mw[1405-1413].eqiad.wmnet

Change 576966 merged by RLazarus:
[operations/puppet@production] site: Assign appservers and API servers in eqiad row C.

https://gerrit.wikimedia.org/r/576966

Icinga downtime for 1:00:00 set by rzl@cumin1001 on 9 host(s) and their services with reason: new install

mw[1405-1413].eqiad.wmnet

Icinga downtime for 1:00:00 set by dzahn@cumin1001 on 8 host(s) and their services with reason: new_install

mw[1385-1392].eqiad.wmnet
Dzahn added a comment.Mar 5 2020, 10:55 PM
{"mw1385.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1385.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1386.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1386.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1387.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1387.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1388.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1388.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1389.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1389.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1390.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1390.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1391.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1391.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1392.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1392.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1393.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1393.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1394.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1394.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1395.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1395.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1396.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1396.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1397.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1397.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1398.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1398.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1399.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1399.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1400.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1400.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1401.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1401.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1402.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1402.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1403.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1403.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1404.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1404.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1405.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1405.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1406.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1406.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1407.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1407.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1408.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1408.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1409.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1409.eqiad.wmnet": {"pooled": "yes", "weight": 30}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1410.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1410.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1411.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1411.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
{"mw1412.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=apache2"}
{"mw1412.eqiad.wmnet": {"weight": 30, "pooled": "yes"}, "tags": "dc=eqiad,cluster=api_appserver,service=nginx"}
{"mw1413.eqiad.wmnet": {"pooled": "inactive", "weight": 0}, "tags": "dc=eqiad,cluster=appserver,service=apache2"}
{"mw1413.eqiad.wmnet": {"pooled": "inactive", "weight": 0}, "tags": "dc=eqiad,cluster=appserver,service=nginx"}
Dzahn updated the task description. (Show Details)Mar 5 2020, 10:58 PM
Dzahn closed this task as Resolved.EditedMar 5 2020, 11:00 PM

The one exception (1413 at the bottom) is currently being used for a test. Everything else is pooled with weight 30 and alternating between appserver and API appserver.

Docs have been updated to avoid Icinga alert spam.

https://wikitech.wikimedia.org/wiki/Application_servers#Adding_a_new_server_into_production

Also we spread these out across racks as well for the new batch. Docs added at https://wikitech.wikimedia.org/wiki/Application_servers#Spreading_application_servers_out_across_rows_and_racks

time stamps when pooling happened, to check performance improvements:

22:40 UTC (March 5) - mw1405-mw1412
22:47 UTC (March 5) - mw1385-mw1389
22:50 UTC (March 5) - mw1390-mw1392

Krinkle added a subscriber: Krinkle.

I've added a Grafana annotation for this event, and looks like we've got some nice perf wins here across the board.

This also shows a ~ 10ms reduction overall for MediaWiki page views and GET/200 requests more generally.

This one +5-10% more requests that respond within the 50ms latency bucket for MediaWiki overall.

This one shows about 5% quicker responses at the 50th and 75h percentile for load.php. From a steady 64ms and 44ms, down to 61ms 42 ms respectively.

Restricted Application added a project: Performance-Team. · View Herald TranscriptMar 5 2020, 11:56 PM
Dzahn added a comment.Mar 6 2020, 2:37 AM

set mw1385 - mw1413 all to status Active in Netbox. mw1413 is also pooled meanwhile. mw1403 was planned -> active, all others were staged -> active

Joe added a comment.Mar 6 2020, 5:35 AM

Let's see how those numbers work when we decommission the oldest servers, but this seems very encouraging indeed.