Page MenuHomePhabricator

rack/setup/install logstash101[012].eqiad.wmnet
Closed, ResolvedPublic

Description

This task will track the racking, setup, and installation of 3 new logstash hosts for eqiad.

There are already existing logstash hosts in eqiad, DC-Ops will need to know if these new 3 hosts are replacing any existing systems, or adding to them. logstash100[456] are physical servers, and logstash100[789] are ganeti VMs in eqiad.

Racking proposal: @RobH and @herron synced via irc. These 3 new hosts are replacing logstash[456]. So, none of the logstash101[012] should share with one another, but they can share with the old hosts if needed. Otherwise 1G internal vlan racks/rows.

logstash1010:

  • - receive in system on procurement task T210498
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/490094
  • - operations/puppet update (install_server at minimum, other files if possible) - https://gerrit.wikimedia.org/r/490101
  • - OS installation - stretch
  • - puppet accept/initial run
  • - handoff for service implementation - @herron

logstash1011:

  • - receive in system on procurement task T210498
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/490094
  • - operations/puppet update (install_server at minimum, other files if possible) - https://gerrit.wikimedia.org/r/490101
  • - OS installation - stretch
  • - puppet accept/initial run
  • - handoff for service implementation - @herron

logstash1012:

  • - receive in system on procurement task T210498
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added - https://gerrit.wikimedia.org/r/490094
  • - operations/puppet update (install_server at minimum, other files if possible) - https://gerrit.wikimedia.org/r/490101
  • - OS installation - stretch
  • - puppet accept/initial run
  • - handoff for service implementation - @herron

Details

Event Timeline

RobH triaged this task as Medium priority.Jan 24 2019, 5:47 PM
RobH created this task.
RobH updated the task description. (Show Details)Jan 24 2019, 5:51 PM
RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)
RobH assigned this task to Cmjohnson.Jan 24 2019, 5:54 PM
RobH edited projects, added ops-eqiad; removed procurement.
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
herron added a comment.Feb 6 2019, 9:00 PM

Hey @Cmjohnson, sending a friendly ping to see how these builds are going. If there's anything I can do to assist remotely just let me know.

hi @herron they are not going just yet. I will get to them next week.

Thanks @Cmjohnson ! Please treat this as priority this week since we're running short on disk space on existing logstash eqiad hosts.

Change 489742 had a related patch set uploaded (by Cmjohnson; owner: Cmjohnson):
[operations/dns@master] Adding mgmt dns logstash101[0-2]

https://gerrit.wikimedia.org/r/489742

Change 489742 merged by Cmjohnson:
[operations/dns@master] Adding mgmt dns logstash101[0-2]

https://gerrit.wikimedia.org/r/489742

i updated the bios versions on all 3 hosts

Cmjohnson reassigned this task from Cmjohnson to RobH.Feb 11 2019, 6:21 PM
Cmjohnson updated the task description. (Show Details)

assigning to @RobH to do the installations

Change 490094 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting logstash100[012] production dns entries

https://gerrit.wikimedia.org/r/490094

Change 490094 merged by RobH:
[operations/dns@master] setting logstash100[012] production dns entries

https://gerrit.wikimedia.org/r/490094

RobH updated the task description. (Show Details)
RobH removed a subscriber: Pswaby.

Change 490101 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] logstash101[012] puppet repo updates

https://gerrit.wikimedia.org/r/490101

Change 490101 merged by RobH:
[operations/puppet@production] logstash101[012] puppet repo updates

https://gerrit.wikimedia.org/r/490101

RobH updated the task description. (Show Details)
RobH added a comment.Feb 12 2019, 5:25 PM

Firmware is being updated on the bios and idrac before OS installation on all three hosts:

installed bios: 1.7.0
installed ilom: 3.21.21.21

newest bios: 1.7.0 (no change, no need to update)
newest ilom: 3.21.26.22 (updating across all three new hosts)

Change 490107 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] fixing netboot entry for new logstash systems

https://gerrit.wikimedia.org/r/490107

Change 490107 merged by RobH:
[operations/puppet@production] fixing netboot entry for new logstash systems

https://gerrit.wikimedia.org/r/490107

RobH updated the task description. (Show Details)
RobH updated the task description. (Show Details)Feb 12 2019, 6:22 PM
RobH reassigned this task from RobH to herron.Feb 12 2019, 6:28 PM
RobH removed a project: ops-eqiad.

@herron,

Ok, these are calling into puppet with role spare. You can apply new roles and push into service.

Feel free to resolve this task once you are aware of it!

Change 490686 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: disable notifications on logstash101[0-2] during setup

https://gerrit.wikimedia.org/r/490686

Change 490686 merged by Herron:
[operations/puppet@production] logstash: disable notifications on logstash101[0-2] during setup

https://gerrit.wikimedia.org/r/490686

Change 490695 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: apply role::logstash to new logstash101[0-2] hardare hosts

https://gerrit.wikimedia.org/r/490695

Change 490695 merged by Herron:
[operations/puppet@production] logstash: apply role::logstash to new logstash101[0-2] hardware hosts

https://gerrit.wikimedia.org/r/490695

herron added a comment.EditedFeb 19 2019, 9:19 PM

logstash101[0-2] have been added to the logging eqiad elasticsearch cluster, and data is now being relocated from the old logstash100[4-6] hosts onto logstash101[0-2]. This will to take some time to complete as there are several TB worth of shards to relocate.

https://grafana.wikimedia.org/d/000000561/logstash?orgId=1&var-datasource=eqiad%20prometheus%2Fops&var-input=gelf%2F12201&panelId=7&fullscreen gives a general idea of progress (although percentages will differ due to the larger storage arrays on the new hosts). logstash1005 is the first host having its data relocated.

RobH removed a subscriber: RobH.Feb 19 2019, 11:19 PM

Mentioned in SAL (#wikimedia-operations) [2019-02-20T16:49:24Z] <herron> migrating es shards away from logstash100[56] with "cluster.routing.allocation.exclude._name" : "logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608

Mentioned in SAL (#wikimedia-operations) [2019-02-21T15:07:46Z] <herron> migrating ES shards away from logstash100[456] with "cluster.routing.allocation.exclude._name" : "logstash1004-production-logstash-eqiad,logstash1005-production-logstash-eqiad,logstash1006-production-logstash-eqiad” T214608

Change 492695 had a related patch set uploaded (by Herron; owner: Herron):
[operations/puppet@production] logstash: remove elasticsearch role from logstash100[456]

https://gerrit.wikimedia.org/r/492695

Setup of new hosts is complete. Tracking follow up steps in T213898

herron closed this task as Resolved.Feb 25 2019, 3:45 PM