Page MenuHomePhabricator

rack/setup/install restbase10[19-27].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking and setup of 9 new restbase systems for eqiad, restbase1019-restbase1027.

Racking Proposal: These are 1G network servers. Rack in rows A, B, & D in 1G racks independently of one another. So 3 per row, no shared racks with other new servers in this batch.

rastbase1019:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1020:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1021:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1022:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1023:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1024:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1025:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1026:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

restbase1027:

  • - receive in system on procurement task T213988
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible)
  • - OS installation
  • - puppet accept/initial run
  • - handoff for service implementation
  • - service implementer changes netbox status from staged to active.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

IRC Update:

08:36 < mobrovac> : robh: heh, distributed over the 4 racks, but urandom can be more precise in this regard ^

For now lets wait for @Eevans to comment with the plan on what new servers are replacing what old servers for further info.

For now lets wait for @Eevans to comment with the plan on what new servers are replacing what old servers for further info.

We should rack them 3 each, in rows a, b, and d

RobH updated the task description. (Show Details)
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.

These are also replacing some existing systems one the new systems are fully online:

row A: restbase1007, restbase1010, & restbase1011
row B: restbase1008, restbase1012, & restbase1013
row d: restbase1009, restbase1014, & restbase1015

@Cmjohnson any movement on this? Do you have an ETA on when the machines will be installed?

@moborvac I haven't had a chance to get to them until this week. I should be able to get them racked and the on-site done this week.

@moborvac I haven't had a chance to get to them until this week. I should be able to get them racked and the on-site done this week.

Great, thank you @Cmjohnson ! Please ping us once it's done.

Change 508650 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns for restbase1019-27

https://gerrit.wikimedia.org/r/508650

Change 508650 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns for restbase1019-27

https://gerrit.wikimedia.org/r/508650

Cmjohnson updated the task description. (Show Details)
Cmjohnson added a subscriber: Cmjohnson.

@mobrovac all of the on-site work has been completed, I am assigning to @RobH to finish the installs and hand over to you.

Change 508735 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] restbase10[19-27].eqiad.wmnet prod dns entries

https://gerrit.wikimedia.org/r/508735

Change 508735 merged by RobH:
[operations/dns@master] restbase10[19-27].eqiad.wmnet prod dns entries

https://gerrit.wikimedia.org/r/508735

Change 508739 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] adding restbase10[19-27] mac addresses

https://gerrit.wikimedia.org/r/508739

Change 508739 merged by RobH:
[operations/puppet@production] adding restbase10[19-27] install params

https://gerrit.wikimedia.org/r/508739

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['restbase1019.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905072339_robh_144413.log.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['restbase1020.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/201905072356_robh_148203.log.

started installing restbase1019 is ready for service handoff.

the others started the installer and then got the manual partitioning menu so something isn't right, will investigate and continue installation.

Change 508936 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] fixing restbase102[7-9] prod dns

https://gerrit.wikimedia.org/r/508936

Change 508936 merged by RobH:
[operations/dns@master] fixing restbase102[7-9] prod dns

https://gerrit.wikimedia.org/r/508936

Ok, all of these systems are now installed and calling into puppet with role::spare so they are in monitoring.

These will need to have someone in SRE or a member of Platform Engineering (Services) take this over to push into service and apply their proper roles. I've pinged both @Eevans and @mobrovac via irc askign about this.

Can someone advise who would be best to take this over, and then re-assign this from me to that person? (I don't want to leave it with no owner and have it get forgotten.)

RobH added a subscriber: fgiunchedi.

After IRC discussion about previous setups, this was previously handled by @fgiunchedi to push into service. I'm reassigning this to him, but if it should go elsewhere please let me know!

Thank you @RobH ! Yes I'll take it from here and hand over as needed when done

Change 509422 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] cassandra: add restbase10[19-27]

https://gerrit.wikimedia.org/r/509422

Change 509423 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] conftool-data: add restbase10[19-27]

https://gerrit.wikimedia.org/r/509423

Change 509422 merged by Filippo Giunchedi:
[operations/puppet@production] cassandra: add restbase10[19-27]

https://gerrit.wikimedia.org/r/509422

Change 510467 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] site: add restbase1019 to production

https://gerrit.wikimedia.org/r/510467

Change 510467 merged by Filippo Giunchedi:
[operations/puppet@production] site: add restbase1019 to production

https://gerrit.wikimedia.org/r/510467

Change 510537 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] site: move restbase102[0-7] to production

https://gerrit.wikimedia.org/r/510537

Mentioned in SAL (#wikimedia-operations) [2019-05-15T16:40:25Z] <urandom> bootstrap restbase1019-c - T219404

Change 510537 merged by Filippo Giunchedi:
[operations/puppet@production] site: move restbase102[0-7] to production

https://gerrit.wikimedia.org/r/510537

Change 510695 had a related patch set uploaded (by Filippo Giunchedi; owner: Filippo Giunchedi):
[operations/puppet@production] cassandra: add init.d 'stop' action

https://gerrit.wikimedia.org/r/510695

Puppet ran on all hosts, next steps are bootstrapping all cassandra instances one at a time

Change 510717 had a related patch set uploaded (by Mobrovac; owner: Mobrovac):
[mediawiki/services/restbase/deploy@master] Targets: Add restbase10[19-27].eqiad.wmnet

https://gerrit.wikimedia.org/r/510717

Mentioned in SAL (#wikimedia-operations) [2019-05-16T12:59:44Z] <mobrovac> bootstrap restbase1020-c - T219404

Change 510695 merged by Filippo Giunchedi:
[operations/puppet@production] cassandra: add init.d 'stop' action

https://gerrit.wikimedia.org/r/510695

Hosts are undergoing cassandra bootstraps by @mobrovac, unassigning

Mentioned in SAL (#wikimedia-operations) [2019-05-17T10:59:07Z] <mobrovac> bootstrap restbase1021-b - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-17T12:58:10Z] <mobrovac> bootstrap restbase1021-c - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-17T14:51:51Z] <mobrovac> bootstrap restbase1022-a - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-17T23:55:22Z] <urandom> bootstrapping restbase1022-b - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-18T02:37:34Z] <urandom> bootstrapping restbase1022-c - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-18T13:51:55Z] <urandom> bootstrapping restbase1023-b - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-18T16:53:39Z] <mobrovac> bootstrap restbase1023-c - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-18T19:16:02Z] <mobrovac> bootstrap restbase1024-a - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-19T06:56:28Z] <mobrovac> bootstrap restbase1024-b - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-19T10:12:51Z] <mobrovac> bootstrap restbase1024-c - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-19T17:51:33Z] <mobrovac> bootstrap restbase1025-a - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-20T06:11:08Z] <mobrovac> bootstrap restbase1025-b - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-20T08:43:34Z] <mobrovac> bootstrap restbase1025-c - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-20T11:17:29Z] <mobrovac> bootstrap restbase1026-a - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-20T13:58:50Z] <mobrovac> bootstrap restbase1026-b - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-20T16:38:02Z] <mobrovac> bootstrap restbase1026-c - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-20T19:19:59Z] <mobrovac> bootstrap restbase1027-a - T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-21T00:47:36Z] <urandom> bootstrapping restbase1027-b -- T219404

Mentioned in SAL (#wikimedia-operations) [2019-05-21T03:36:07Z] <urandom> bootstrapping restbase1027-c -- T219404

Change 509423 merged by Filippo Giunchedi:
[operations/puppet@production] conftool-data: add restbase10[19-27]

https://gerrit.wikimedia.org/r/509423

Change 510717 merged by Mobrovac:
[mediawiki/services/restbase/deploy@master] Targets: Add restbase10[19-27].eqiad.wmnet

https://gerrit.wikimedia.org/r/510717