Page MenuHomePhabricator

(Need By: TBD) rack/setup/install logstash103[345]
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of logstash103[345].

Please note these were racked with hostnames logstash-be103[345], but as the codfw matching hosts didn't get the -be hostname update, these need to be revered back to logstash103[345].

Hostname / Racking / Installation Details

Hostnames: logstash103[345]
Racking Proposal: Please distribute evenly across rows
Networking/Subnet/VLAN/IP: 10G, Private
Partitioning/Raid: Software RAID-0, with existing recipe.
OS Distro: Buster

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

logstash1033:

  • - receive in system on procurement task T264641 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/661786
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

logstash1034:

  • - receive in system on procurement task T264641 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/661786
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

logstash1035:

  • - receive in system on procurement task T264641 & in coupa
  • - rack system with proposed racking plan (see above) & update netbox (include all system info plus location, state of planned)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - network port setup (description, enable, vlan)
    • end on-site specific steps
  • - production dns entries added
  • - operations/puppet update (install_server at minimum, other files if possible) https://gerrit.wikimedia.org/r/661786
  • - OS installation
  • - puppet accept/initial run (with role:spare)
  • - host state in netbox set to staged

Once the system(s) above have had all checkbox steps completed, this task can be resolved.

Event Timeline

RobH added a parent task: Unknown Object (Task).Nov 10 2020, 5:14 PM
RobH moved this task from Backlog to Racking Tasks on the ops-eqiad board.
RobH mentioned this in Unknown Object (Task).
RobH unsubscribed.

@Cmjohnson all host racked and cabled netbox updated
host port
logstash-be1033 39
logstash-be1034 21
logstash-be1035 7

Cmjohnson subscribed.

Rob, these are ready for you with the temp password.

RobH added subscribers: herron, RobH.

@herron: before I image these, they have an odd hostname of logstash-be103[345], where the codfw logstash ordered in Q2 just have normal logstash2* hostnames.

Can you confirm what the hostnames should be:

  1. logstash-be103[345]
  2. logstash103[345]

please comment and assign back to me for followup.

RobH renamed this task from (Need By: TBD) rack/setup/install logstash-be103[345] to (Need By: TBD) rack/setup/install logstash103[345].Feb 4 2021, 6:02 PM
RobH claimed this task.
RobH updated the task description. (Show Details)

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['logstash1033.eqiad.wmnet', 'logstash1034.eqiad.wmnet', 'logstash1035.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202102041925_robh_28818.log.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['logstash1033.eqiad.wmnet', 'logstash1034.eqiad.wmnet', 'logstash1035.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202102041925_robh_28818.log.

killed this since i ran it before i merged my puppet changes

Change 661786 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] logstash103[345] puppet updates

https://gerrit.wikimedia.org/r/661786

Change 661786 merged by RobH:
[operations/puppet@production] logstash103[345] puppet updates

https://gerrit.wikimedia.org/r/661786

RobH updated the task description. (Show Details)
RobH removed subscribers: Cmjohnson, Jclark-ctr.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['logstash1033.eqiad.wmnet', 'logstash1034.eqiad.wmnet', 'logstash1035.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202102041945_robh_14987.log.

Change 661800 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] correcting logstash103[345] macs

https://gerrit.wikimedia.org/r/661800

Change 661800 merged by RobH:
[operations/puppet@production] correcting logstash103[345] macs

https://gerrit.wikimedia.org/r/661800

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

['logstash1033.eqiad.wmnet', 'logstash1034.eqiad.wmnet', 'logstash1035.eqiad.wmnet']

The log can be found in /var/log/wmf-auto-reimage/202102042032_robh_31273.log.

So these aren't getthing dhcp leases and moving past pxe boot, need to investigate why in further detail. Puppet repo has been updated with the 10g interface mac addresses and these should image fine once they pxe successfully.

next steps:

  • turn on serial redirection after post on logstash1033 and manually pxe boot to see what happens

Ok, in checking these hosts, all of them appear to have their network setup properly in netbox/on switch but fail media check. Since netbox even has the dac cable labels, I suspect that all three cables just aren't properly seated on one or both ends of each patch.

@Jclark-ctr: When you are next onsite, can you check/reseat the dac cable for these. If it is during our overlap hours, feel free to ping me to test things on my end! Specifically, ensure the DAC cable is plugged into the correct ports, correctly identified, and seated properly on both ends for all three of these hosts:
https://netbox.wikimedia.org/dcim/devices/3023/
https://netbox.wikimedia.org/dcim/devices/3024/
https://netbox.wikimedia.org/dcim/devices/3025/

Once we have a working network link, I'll be able to image these and hand them off to their service owners.

@RobH Checked all three host no issue with DAC possibly port is turned off for two host
https://netbox.wikimedia.org/dcim/devices/3023/ moved dac cable from port 39 to 41. lit up for connection.
https://netbox.wikimedia.org/dcim/devices/3024/ moved from port 21 to 23. has connection now..
https://netbox.wikimedia.org/dcim/devices/3025/ no issues i see with connection with dac

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

logstash1035.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102171620_robh_27979_logstash1035_eqiad_wmnet.log.

Change 664855 had a related patch set uploaded (by RobH; owner: RobH):
[operations/puppet@production] fixing logstash103[345] partman

https://gerrit.wikimedia.org/r/664855

Change 664855 merged by RobH:
[operations/puppet@production] fixing logstash103[345] partman

https://gerrit.wikimedia.org/r/664855

Completed auto-reimage of hosts:

['logstash1035.eqiad.wmnet']

Of which those FAILED:

['logstash1035.eqiad.wmnet']

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

logstash1035.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102171711_robh_12426_logstash1035_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['logstash1035.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

logstash1033.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102171917_robh_25549_logstash1033_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['logstash1033.eqiad.wmnet']

and were ALL successful.

Script wmf-auto-reimage was launched by robh on cumin1001.eqiad.wmnet for hosts:

logstash1034.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/202102172014_robh_13882_logstash1034_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['logstash1034.eqiad.wmnet']

and were ALL successful.

Ok, these are all setup and imaged, staged and ready for subteam takeover.