Page MenuHomePhabricator

relocate/reimage cloudvirt1023 with 10G interfaces
Closed, ResolvedPublic

Description

This needs to stay in the same row, but needs moving to a rack with a 10g switch.

cloudvirt1023 migration to 10G:

  • - put system offline in all checks for maint window
  • - relocate to 10G rack and update netbox
  • - enable PXE for 10G interfaces.
  • - update switch configuration for new primary 10G and secondary 10G ports (remove old switch port information)
  • - update RAID config: two spare drives, raid10 for the rest
  • - PXE boot and reimage system
  • - reintroduce system into service cluster

Event Timeline

Cmjohnson moved this task from Backlog to Cloud Tasks on the ops-eqiad board.Aug 8 2019, 3:16 PM

Change 530094 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Move cloudvirt1021, 1022 and 1023 to Stretch

https://gerrit.wikimedia.org/r/530094

Change 530094 merged by Phamhi:
[operations/puppet@production] Move cloudvirt1021, 1022 and 1023 to Stretch

https://gerrit.wikimedia.org/r/530094

Script wmf-auto-reimage was launched by phamhi on cumin1001.eqiad.wmnet for hosts:

cloudvirt1023.mgmt.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201908141630_phamhi_10976_cloudvirt1023_mgmt_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cloudvirt1023.mgmt.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1023.mgmt.eqiad.wmnet']

Script wmf-auto-reimage was launched by phamhi on cumin1001.eqiad.wmnet for hosts:

cloudvirt1023.mgmt.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201908141707_phamhi_95392_cloudvirt1023_mgmt_eqiad_wmnet.log.

Phamhi added a subscriber: Phamhi.Aug 14 2019, 5:16 PM

I managed to bypass that issue by running

sudo wmf-auto-reimage-host --no-verify -p T229871 cloudvirt1023.mgmt.eqiad.wmnet

but it looks like manual intervention is required at the step below

Completed auto-reimage of hosts:

['cloudvirt1023.mgmt.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1023.mgmt.eqiad.wmnet']

I managed to bypass that issue by running

sudo wmf-auto-reimage-host --no-verify -p T229871 cloudvirt1023.mgmt.eqiad.wmnet

but it looks like manual intervention is required at the step below

The partman recipe we are using has some misconfiguration that requires us to press that key. It happens in some recipes, sadly. It's a bug that should be solved. Debugging and tweaking partman recipes is such a difficulty that nobody manages to effectively address it.

Script wmf-auto-reimage was launched by phamhi on cumin1001.eqiad.wmnet for hosts:

cloudvirt1023.mgmt.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201908151508_phamhi_184898_cloudvirt1023_mgmt_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cloudvirt1023.mgmt.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1023.mgmt.eqiad.wmnet']

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

cloudvirt1023.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201908160928_aborrero_141854_cloudvirt1023_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cloudvirt1023.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1023.eqiad.wmnet']

Script wmf-auto-reimage was launched by aborrero on cumin1001.eqiad.wmnet for hosts:

cloudvirt1023.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201908160945_aborrero_145339_cloudvirt1023_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['cloudvirt1023.eqiad.wmnet']

Of which those FAILED:

['cloudvirt1023.eqiad.wmnet']

I did a manual install-console on this host and it's doing its initial puppet run now.

@Cmjohnson and @wiki_willy, I just want to clarify what's happening on this ticket.

The primary task (re-racking and moving to 10G networking) still needs doing. We're having a bit of a capacity crisis so we've reimaged these servers in the meantime (just to have them available for an emergency), but these are until further notice these are still ready for you to move as soon as you're able.

I'll be back to work on 8/29 and would love to have these on 10G sometime close to then so that I can get them back into our main pool.

@Andrew This server will require a physical move to B2, B4 or B7. I will do this one last, working on cabling 1021/1022 and updating the raid cfg so you can re-image

B0:26:28:29:6A:E0

Cmjohnson reassigned this task from Cmjohnson to Andrew.Thu, Sep 5, 7:15 PM
Cmjohnson updated the task description. (Show Details)

@Andrew the new mac is in an earlier update. The server is moved, connected to the new port and raid cfg completed...needs the dhcp file updated and ready for you to re-image.

Change 534663 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] Cloudvirt1023: move to 10G nic

https://gerrit.wikimedia.org/r/534663

Change 534663 merged by Andrew Bogott:
[operations/puppet@production] Cloudvirt1023: move to 10G nic

https://gerrit.wikimedia.org/r/534663

Change 534681 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] openstack scheduler: update comments for cloudvirts

https://gerrit.wikimedia.org/r/534681

Change 534682 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1023: rename network interfaces

https://gerrit.wikimedia.org/r/534682

Change 534682 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1023: rename network interfaces

https://gerrit.wikimedia.org/r/534682

Change 534684 had a related patch set uploaded (by Andrew Bogott; owner: Andrew Bogott):
[operations/puppet@production] cloudvirt1023: rename interfaces, again

https://gerrit.wikimedia.org/r/534684

Change 534684 merged by Andrew Bogott:
[operations/puppet@production] cloudvirt1023: rename interfaces, again

https://gerrit.wikimedia.org/r/534684

Andrew closed this task as Resolved.Fri, Sep 6, 3:33 AM
Andrew updated the task description. (Show Details)

Change 534681 merged by Andrew Bogott:
[operations/puppet@production] openstack scheduler: update comments for cloudvirts

https://gerrit.wikimedia.org/r/534681