Page MenuHomePhabricator

rack/setup/install labtestpuppetmaster2001
Closed, ResolvedPublic

Description

This task will track the racking and setup/installation of the new labstestpuppetmaster2001 system in codfw.

hostname label: labtestpuppetmaster1001 likely wont fit on the name label. try labtestpm2001 and have the visible label in racktables note the difference.

Racking Plan: This host will be placed in the public vlan, so it can be racked in any 1GBE rack in any row in codfw.

  • - receive in system on procurement task T164517
  • - rack system with proposed racking plan (see above) & update racktables (include all system info plus location)
  • - bios/drac/serial setup/testing
  • - mgmt dns entries added for both asset tag and hostname
  • - production dns entries added https://gerrit.wikimedia.org/r/#/c/358485/
  • - network port setup (description, enable, vlan) (create sub task if not done immediately)
  • - operations/puppet update (install_server at minimum, other files if possible) - raid1,lvm,srv,ext4.
  • - OS installation
  • - puppet/salt accept/initial run
  • - handoff for service implementation

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 357841 had a related patch set uploaded (by Papaul; owner: Papaul):
[operations/dns@master] DNS: Add mgmt and production DNS for labtestpuppetmaster2001

https://gerrit.wikimedia.org/r/357841

Change 357841 merged by RobH:
[operations/dns@master] DNS: Add mgmt and production DNS for labtestpuppetmaster2001

https://gerrit.wikimedia.org/r/357841

Change 358485 had a related patch set uploaded (by RobH; owner: RobH):
[operations/dns@master] setting labtestpuppetmaster2001 production dns

https://gerrit.wikimedia.org/r/358485

Change 358485 merged by RobH:
[operations/dns@master] setting labtestpuppetmaster2001 production dns

https://gerrit.wikimedia.org/r/358485

I troubleshoot this with Daniel by replacing replacing eth0 with eth1 MAC address in the DHCP file but same problem can not boot also from eth1. I will run a hardware diagnostic on th system.

In the process of troubleshooting the pxe boot issue on this system, I setup a test dhcp/dns/tftp server on my laptop and boot the server to it the and it works with no problem so the NIC 1 works with no problem. See below for screen chat. I reconnected the server back to WMF network and try to pxe boot again and i am getting this in the dhcp log :

DHCPDISCOVER from 30:e1:71:63:5e:5c via 208.80.153.98: network 208.80.153.96/27: no free leases
Jun 20 17:21:43 install2002 dhcpd[11106]: DHCPDISCOVER from 30:e1:71:63:5e:5c via 208.80.153.99: network 208.80.153.96/27: no free leases
This tells me that something is wrong with DNS

Screen chat
Broadcom UNDI PXE-2.1 v17.2.1
Copyright (C) 2000-2015 Broadcom Corporation
Copyright (C) 1997-2000 Intel Corporation
All rights reserved.

CLIENT MAC ADDR: 30 E1 71 63 5E 5C GUID: 32353537-3835-584D-5137-323230333232

CLIENT IP: 10.0.0.3  MASK: 255.255.255.0  DHCP IP: 10.0.0.1

GATEWAY IP: 10.0.0.1

                                                            TFTP.

PXE-T01: File not found

PXE-E3B: TFTP Error - File Not found

PXE-M0F: Exiting Broadcom PXE ROM.

Change 360388 had a related patch set uploaded (by Dzahn; owner: Dzahn):
[operations/dns@master] fix DNS for labtestpuppetmaster, 1002 != 2001

https://gerrit.wikimedia.org/r/360388

Change 360388 merged by Dzahn:
[operations/dns@master] fix DNS for labtestpuppetmaster, 1002 != 2001

https://gerrit.wikimedia.org/r/360388

Jun 20 17:21:43 install2002 dhcpd[11106]: DHCPDISCOVER from 30:e1:71:63:5e:5c via 208.80.153.99: network 208.80.153.96/27: no free leases
This tells me that something is wrong with DNS

Yes, there was something wrong with DNS, see fix above. It should work now (maybe up to 1 hour waiting because of TTL)

Daniel find out that for 208.80.153.108
reverse lookup = 2001
and forward lookup = 1002

He fixed it and will try install again

Mentioned in SAL (#wikimedia-operations) [2017-06-20T21:13:21Z] <mutante> labtestpuppetmaster2001 - install-console, activate puppet, sign cert, initial puppet run, add salt key (T167157)

@Andrew this is complete you can take over from here.

Thanks.

Change 364798 had a related patch set uploaded (by Dzahn; owner: Andrew Bogott):
[operations/dns@master] Add AAAA for labtestpuppetmaster2001

https://gerrit.wikimedia.org/r/364798

Change 364798 merged by Dzahn:
[operations/dns@master] Add AAAA for labtestpuppetmaster2001

https://gerrit.wikimedia.org/r/364798

Mentioned in SAL (#wikimedia-operations) [2017-07-13T02:26:23Z] <mutante> labtestpuppetmaster2001 - flapping icinga alerts about salt-minion starting and stopping constantly - there is an accepted salt-key but it was rejected by the master, server was reinstalled but still old key - deleted old key, accepted new key (T167157)

root@labtestpuppetmaster2001:~# ip a s | grep inet6
inet6 2620:0:860:4:208:80:153:108/64 scope global mngtmpaddr dynamic


host labtestpuppetmaster2001.wikimedia.org
labtestpuppetmaster2001.wikimedia.org has address 208.80.153.108
labtestpuppetmaster2001.wikimedia.org has IPv6 address 2620:0:860:4:208:80:153:108

host 2620:0:860:4:208:80:153:108
8.0.1.0.3.5.1.0.0.8.0.0.8.0.2.0.4.0.0.0.0.6.8.0.0.0.0.0.0.2.6.2.ip6.arpa domain name pointer labtestpuppetmaster2001.wikimedia.org.

This is up and working.