Disks for lithium have arrived in {T139612} and we'll need to get the current disks replaced once syslog server in codfw is fully setup in T138073: setup syslog server in codfw
Description
Details
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | • Cmjohnson | T143307 Add new disks to syslog server in eqiad (lithium) | |||
Unknown Object (Task) |
Event Timeline
stalled until T138073: setup syslog server in codfw is resolved and we have redundancy cc @Cmjohnson
@Cmjohnson we can go ahead with swapping the disks and reimage now. wezen.codfw.wmnet has a month worth of logs for redundancy.
Change 314280 had a related patch set uploaded (by Filippo Giunchedi):
install_server: reinstall lithium with jessie and gpt
Change 314280 merged by Filippo Giunchedi:
install_server: reinstall lithium with jessie and gpt
Mentioned in SAL (#wikimedia-operations) [2016-10-05T15:35:16Z] <godog> reimage lithium with bigger disks T143307
I see lithium still stuck at
Scanning for devices. Please wait, this may take several minutes...
so likely a reseat or sth like that is needed @Cmjohnson ?
@fgiunchedi The disks are fine, the bios sees them correctly and during this morning's attempt to install Jessie, I was able to see the offer/request and an image was served but eventually timed out. On subsequent attempts to do the same thing, lithium hits cabon for a dhcp offer but nothing happens.
Log from when I did get an image
Oct 6 10:32:06 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 6 10:32:06 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 6 10:32:10 carbon dhcpd: DHCPREQUEST for 10.64.32.154 (208.80.154.10) from c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 6 10:32:10 carbon dhcpd: DHCPACK on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 6 10:32:10 carbon dhcpd: DHCPREQUEST for 10.64.32.154 (208.80.154.10) from c8:1f:66:bf:7f:ea via 10.64.32.2
Oct 6 10:32:10 carbon dhcpd: DHCPACK on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.2
Oct 6 10:32:11 carbon atftpd[7951]: Serving jessie-installer/debian-installer/amd64/pxelinux.0 to 10.64.32.154:2070
Oct 6 10:32:11 carbon atftpd[7951]: Serving jessie-installer/debian-installer/amd64/pxelinux.0 to 10.64.32.154:2071
Oct 6 10:32:11 carbon atftpd[7951]: Serving jessie-installer/ldlinux.c32 to 10.64.32.154:49152
Oct 6 10:32:11 carbon atftpd[7951]: Serving jessie-installer/pxelinux.cfg/ttyS1-115200 to 10.64.32.154:49153
Oct 6 10:32:11 carbon atftpd[7951]: Serving jessie-installer/pxelinux.cfg/boot.txt to 10.64.32.154:49154
Oct 6 10:32:21 carbon atftpd[7951]: Serving jessie-installer/debian-installer/amd64/linux to 10.64.32.154:49155
Oct 6 10:32:26 carbon atftpd[7951]: timeout: retrying...
Oct 6 10:33:48 atftpd[7951]: last message repeated 4 times
Log from when I didn't
Oct 6 10:39:54 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 6 10:39:54 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.3
Indeed that's odd @Cmjohnson I can see the dhcp offers from _both_ cr1 and cr2 in eqiad coming in a roughly the same time
Oct 6 11:33:10 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.2 Oct 6 11:33:10 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.2 Oct 6 11:33:10 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.3 Oct 6 11:33:10 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.3
I tried to putting the 500GB disks but still running into issues with the installer. I checked the vlan, switch port, dhcp file.
Oct 12 16:37:52 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.2
Oct 12 16:37:52 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.2
Oct 12 16:37:52 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 12 16:37:52 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 12 16:43:10 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.2
Oct 12 16:43:10 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.2
Oct 12 16:43:10 carbon dhcpd: DHCPDISCOVER from c8:1f:66:bf:7f:ea via 10.64.32.3
Oct 12 16:43:10 carbon dhcpd: DHCPOFFER on 10.64.32.154 to c8:1f:66:bf:7f:ea via 10.64.32.3
I had a look at this. This is not network related. carbon answers as it should, the routers relay the DHCP packets as they should. AFAICT it's the motherboard that fails to acknowledge the receipt of the DHCP packets or the receipt of some TFTP packets.
Change 317550 had a related patch set uploaded (by Filippo Giunchedi):
base: send syslog only to codfw to reimage lithium
Change 317550 merged by Filippo Giunchedi:
base: send syslog only to codfw to reimage lithium
Change 317566 had a related patch set uploaded (by Filippo Giunchedi):
Revert "base: send syslog only to codfw to reimage lithium"
Change 317566 merged by Filippo Giunchedi:
Revert "base: send syslog only to codfw to reimage lithium"
as pointed out by @faidon the problem with pxe failing is that lithium was hammered with udp packets from the fleet. After removing lithium as syslog target the install went fine.
Lithium is back in service with 4TB disks, resolving.