Page MenuHomePhabricator

labnet1003: reimage+rename to cloudnet1003
Closed, ResolvedPublic

Description

Reimage + rename this server to the new naming scheme.

Timeline would be:

  • disable puppet in labnet1003
  • merge puppet patch to rename and get the new debian installer working
  • merge dns patch to add the new FQDNs (partial, the old mgmt names still remains)
  • run the wmf-auto-reimage-host script
  • merge DNS cleanup patch
  • racktables update
  • get the physical relabeling done (T199524)
  • done

Same happened to labvirt1021/cloudvirt1021 (see T199107 and T199132).

Event Timeline

aborrero triaged this task as Normal priority.Jul 13 2018, 10:12 AM
aborrero created this task.
Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptJul 13 2018, 10:12 AM

Change 445587 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: reimage+rename labnet1003 as cloudnet1003

https://gerrit.wikimedia.org/r/445587

Change 445589 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] cloudvps: reimage+rename labnet1003 as cloudnet1003

https://gerrit.wikimedia.org/r/445589

Change 445589 merged by Arturo Borrero Gonzalez:
[operations/dns@master] cloudvps: reimage+rename labnet1003 as cloudnet1003

https://gerrit.wikimedia.org/r/445589

Change 445587 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: reimage+rename labnet1003 as cloudnet1003

https://gerrit.wikimedia.org/r/445587

aborrero updated the task description. (Show Details)Jul 13 2018, 10:27 AM

Script wmf-auto-reimage was launched by aborrero on neodymium.eqiad.wmnet for hosts:

labnet1003.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201807131038_aborrero_4592_labnet1003_eqiad_wmnet.log.

Using aborrero@neodymium:~$ sudo wmf-auto-reimage-host -p T199521 --rename cloudnet1003.eqiad.wmnet --rename-mgmt cloudnet1003.mgmt.eqiad.wmnet labnet1003.eqiad.wmnet labnet1003.mgmt.eqiad.wmnet

Change 445591 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] install_server: add autoinstall recipe for cloudnet1003

https://gerrit.wikimedia.org/r/445591

Change 445591 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] install_server: add autoinstall recipe for cloudnet1003

https://gerrit.wikimedia.org/r/445591

Completed auto-reimage of hosts:

['cloudnet1003.eqiad.wmnet']

Of which those FAILED:

['cloudnet1003.eqiad.wmnet']

Script wmf-auto-reimage was launched by aborrero on neodymium.eqiad.wmnet for hosts:

labnet1003.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201807131046_aborrero_5590_labnet1003_eqiad_wmnet.log.

Completed auto-reimage of hosts:

['labnet1003.eqiad.wmnet']

Of which those FAILED:

['labnet1003.eqiad.wmnet']

Script wmf-auto-reimage was launched by aborrero on neodymium.eqiad.wmnet for hosts:

cloudnet1003.eqiad.wmnet

The log can be found in /var/log/wmf-auto-reimage/201807131047_aborrero_5721_cloudnet1003_eqiad_wmnet.log.

aborrero added a comment.EditedJul 13 2018, 11:25 AM

The initial boot with jessie is affected by T149845. (thanks @MoritzMuehlenhoff
for the pointer).

The boot looks like:

Loading Linux 4.9.0-0.bpo.7-amd64 ...
Loading initial ramdisk ...
[    0.127231] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 38d is 330)
Loading, please wait...
mdadm: No devices listed in conf file were found.
Gave up waiting for root device.  Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay= (did the system wait long enough?)
   - Check root= (did the system wait for the right device?)
 - Missing modules (cat /proc/modules; ls /dev)
ALERT!  /dev/disk/by-uuid/c3b6474b-8ab4-44ed-b886-083b59abfce7 does not exist.  Dropping to a shell!
modprobe: module ehci-orion not found in modules.dep


BusyBox v1.22.1 (Debian 1:1.22.0-9+deb8u1) built-in shell (ash)
Enter 'help' for a list of built-in commands.

/bin/sh: can't access tty; job control turned off
(initramfs)

You can get the boot working by using:

(initramfs) cat /etc/mdadm/mdadm.conf 
HOMEHOST <system>
ARRAY /dev/md/0  metadata=1.2 UUID=85917ef1:b235d5e2:60345a9b:e17bbd2d name=cloudnet1003:0
ARRAY /dev/md/1  metadata=1.2 UUID=4c67bbe8:95e9be3e:ead58189:b9642669 name=cloudnet1003:1
(initramfs) mdadm --assemble /dev/md/0
mdadm: /dev/md/0 has been started with 2 drives.
(initramfs) exit

Completed auto-reimage of hosts:

['cloudnet1003.eqiad.wmnet']

and were ALL successful.

aborrero updated the task description. (Show Details)Jul 13 2018, 12:20 PM
aborrero updated the task description. (Show Details)

Change 445606 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/dns@master] cloudvps: cleanup labnet1003 entries

https://gerrit.wikimedia.org/r/445606

Change 445606 merged by Arturo Borrero Gonzalez:
[operations/dns@master] cloudvps: cleanup labnet1003 entries

https://gerrit.wikimedia.org/r/445606

aborrero updated the task description. (Show Details)Jul 13 2018, 1:04 PM
ayounsi removed a subscriber: ayounsi.Jul 13 2018, 1:25 PM
aborrero closed this task as Resolved.Aug 6 2018, 12:38 PM
aborrero updated the task description. (Show Details)

Done.

Change 472222 had a related patch set uploaded (by GTirloni; owner: GTirloni):
[operations/dns@master] cloudvps: cleanup labvirt1017 entries

https://gerrit.wikimedia.org/r/472222

Change 472222 merged by GTirloni:
[operations/dns@master] cloudvps: cleanup labvirt1017 entries

https://gerrit.wikimedia.org/r/472222