Page MenuHomePhabricator

wikikube-ctrl2002: Switch network cable from port 2 to port 1 on the 10G NIC
Closed, ResolvedPublic


In T379629: wikikube-ctrl1001.eqiad.wmnet fails PXE boot we figured out that PXE boot does not work reliably when we use port 2 on the BCM57412.

We should switch from port 2 to port 1 because of this and to adhere to our general default.

After switching the port a cold reboot might be required so that the card does detect link. Also the BIOS settings need changing:

  • Ensure PXE is disabled on all embedded NIC ports
  • Ensure PXE is enabled on all 10G NIC ports
  • Ensure Port 1 is first in the boot order list (after Harddrive C:)

As the server is in service, we need to coordinate a short maintenance window.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I could do this today. or we can wait until next week. assuming no one wants to do a maintenance on a friday.

depool host wikikube-ctrl2002.codfw.wmnet by jayme@cumin2002 with reason: None

Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin2002 depool for host wikikube-ctrl2002.codfw.wmnet completed:

  • wikikube-ctrl2002.codfw.wmnet (PASS)
    • Host wikikube-ctrl2002.codfw.wmnet depooled from wikikube-codfw

Mentioned in SAL (#wikimedia-operations) [2024-11-14T15:29:13Z] <jayme@cumin2002> START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719

Mentioned in SAL (#wikimedia-operations) [2024-11-14T15:29:29Z] <jayme@cumin2002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-ctrl2002.codfw.wmnet with reason: T379719

pool host wikikube-ctrl2002.codfw.wmnet by jayme@cumin2002 with reason: None

Cookbook cookbooks.sre.k8s.pool-depool-node started by jayme@cumin2002 pool for host wikikube-ctrl2002.codfw.wmnet completed:

  • wikikube-ctrl2002.codfw.wmnet (PASS)
    • Host wikikube-ctrl2002.codfw.wmnet pooled in wikikube-codfw
JMeybohm closed this task as Resolved.EditedNov 14 2024, 3:47 PM
JMeybohm claimed this task.

@Jhancock.wm swapped the cable into port 1, I've changed BIOS settings and the server is back up. Thanks!