Page MenuHomePhabricator

Q4:rack/setup/install sretest2009
Closed, ResolvedPublic

Description

This task will track the racking, setup, and OS installation of sretest2009.codfw.wmnet

Hostname / Racking / Installation Details

Hostnames: What are the hostnames, and have you updated https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions ?
Racking Proposal: anywhere, test server
Networking Setup: # of Connections:1/2 - Speed:10G. - VLAN:Private/Public/Other(Specify) :
OS Distro: Bookworm (default unless otherwise specified)
Boot Method: Legacy BIOS or UEFI. Please note UEFI must have partman updates applied in advance of setup and is currently in pilot program: https://wikitech.wikimedia.org/wiki/UEFI_Boot
Sub-team Technical Contact: @RobH

Per host setup checklist

Each host should have its own setup checklist copied and pasted into the list below.

sretest2009.codfw.wmnet:
  • Receive in system on procurement task T393032 & in Coupa
  • Rack system with proposed racking plan (see above) & update Netbox (include all system info plus location, state of planned)
  • Run the Provision a server's network attributes Netbox script - Note that you must run the DNS and Provision cookbook after completing this step
  • Immediately run the sre.dns.netbox cookbook
  • Immediately run the sre.hosts.provision cookbook
  • Update the operations/puppet repo - this should include updates to preseed.yaml, and site.pp with roles defined by service group: https://wikitech.wikimedia.org/wiki/SRE/Dc-operations
  • Run the sre.hosts.reimage cookbook

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Jhancock.wm mentioned this in Unknown Object (Task).Jun 9 2025, 3:00 PM

@RobH could you help me with updating the site.pp for this server?

Change #1161949 had a related patch set uploaded (by Jhancock.wm; author: Jhancock.wm):

[operations/puppet@production] Adding sretest2009 to site.pp

https://gerrit.wikimedia.org/r/1161949

Change #1161949 merged by Jhancock.wm:

[operations/puppet@production] Adding sretest2009 to site.pp

https://gerrit.wikimedia.org/r/1161949

Change #1163392 had a related patch set uploaded (by Jhancock.wm; author: Jhancock.wm):

[operations/puppet@production] Adding and updating sretest200X servers

https://gerrit.wikimedia.org/r/1163392

Change #1163392 abandoned by Jhancock.wm:

[operations/puppet@production] Adding and updating sretest200X servers

Reason:

seeking expert advice

https://gerrit.wikimedia.org/r/1163392

I need to change the preseed.yaml file so that sretest2005, sretest2006, sretest2009, and sretest2010 (just to cover some other servers in one go) have the same partman as sretest2004.

Change #1164459 had a related patch set uploaded (by Jhancock.wm; author: Jhancock.wm):

[operations/puppet@production] Adding and Updating sretest hosts in codfw

https://gerrit.wikimedia.org/r/1164459

Change #1164459 abandoned by Jhancock.wm:

[operations/puppet@production] Adding and Updating sretest hosts in codfw

Reason:

rebase not rebasing

https://gerrit.wikimedia.org/r/1164459

Change #1164460 had a related patch set uploaded (by Jhancock.wm; author: Jhancock.wm):

[operations/puppet@production] Updating and Adding sretest hosts to preseed.yaml

https://gerrit.wikimedia.org/r/1164460

Change #1164464 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] sretest updates

https://gerrit.wikimedia.org/r/1164464

Change #1164460 merged by RobH:

[operations/puppet@production] Updating and Adding sretest hosts to preseed.yaml

https://gerrit.wikimedia.org/r/1164460

Change #1164464 abandoned by RobH:

[operations/puppet@production] sretest updates

https://gerrit.wikimedia.org/r/1164464

Change #1164467 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] fixing sretest

https://gerrit.wikimedia.org/r/1164467

Change #1164467 merged by RobH:

[operations/puppet@production] fixing sretest

https://gerrit.wikimedia.org/r/1164467

Change #1164471 had a related patch set uploaded (by RobH; author: RobH):

[operations/puppet@production] sretest preseed update

https://gerrit.wikimedia.org/r/1164471

Change #1164471 merged by RobH:

[operations/puppet@production] sretest preseed update

https://gerrit.wikimedia.org/r/1164471

Traceback (most recent call last):

File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 497, in _found_diffs_bios_attributes
  if not bios_attributes[key] == value:
         ~~~~~~~~~~~~~~~^^^^^

KeyError: 'ConsoleRedirection'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):

File "/usr/lib/python3/dist-packages/spicerack/_menu.py", line 265, in _run
  raw_ret = runner.run()
            ^^^^^^^^^^^^
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 294, in run
  self._config_host()
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 435, in _config_host
  should_patch = self._found_diffs_bios_attributes(bios_attributes)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/srv/deployment/spicerack/cookbooks/sre/hosts/provision.py", line 506, in _found_diffs_bios_attributes
  raise RuntimeError(

RuntimeError: Error while checking BIOS attribute ConsoleRedirection
Rolling back DHCP setup

new errors trying to provision this server. got dhcp and user to work but not getting a clean finish

@Jhancock.wm per our conversations on irc yesterday i believe that should be setup under this partman

  • partman/custom/boss_leavelvm.cfg

@Clement_Goubert hey this is a test server that could be a 1 CPU alternative for your wikikube-worker servers. Could you set the partman for this server how you would prefer it if you were going to get this one? (probably gonna hand it to you after initial testing is complete to do your own specific testing.) Thanks for your help!

@Jhancock.wm Hi! I tried to run a customized version of the provision script (the same that worked for sretest2010) but for some reason the host doesn't seem to be network reachable. Is there anything ongoing onsite?

checked the physical cables and everything lines up right. couldn't get into the BMC. re-ran the reqular provisioning script and can access the BMC now. But won't let me set the root password in the script. I can login to the BMC with the one printed on the luggage tag. I'll DM it to you. I don't wanna add the root user if you still need to test on that.

@Jhancock.wm I managed to make provision working, the new settings are not yet merged so if you have other similar hosts ping me first :)

The issue with the passwords/accounts is a weird one: it seems that only one wrong password is sufficient to trigger a user to be locked for 30s. This is not what the BMC's settings are stating, since from the WebUI I can clearly see 3 attempts. Anyway, our logic tries ADMIN with the root password first, and if it fails calvin, and as last the custom pass on the label. The main problem in this case is that ADMIN didn't have the root pass, so the account locked immediately and the other two attempts failed :( No idea why it is doing this, very weird.

I changed the ADMIN password to the root one, and the cookbook completed fine.

I re-ran the cookbook and I found this:

Response payload: {'error': {'code': 'Base.1.10.3.GeneralError', 'message': 'A general error has occurred. See ExtendedInfo for more information.', '@Message.ExtendedInfo': [{'MessageId': 'Base.1.10.PropertyNotWritable', 'Severity': 'Warning', 'Resolution': 'Remove the property from the request body and resubmit the request if the operation failed.', 'Message': 'The property UserName is a read only property and cannot be assigned a value.', 'MessageArgs': ['UserName'], 'RelatedProperties': ['UserName']}]}}

So another change to the API, sigh... You are unblocked, I'll try to patch it later on with some new logic.

Change #1172265 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/software/spicerack@master] redfish: simplify change_user_password for Supermicro

https://gerrit.wikimedia.org/r/1172265

Change #1172265 merged by jenkins-bot:

[operations/software/spicerack@master] redfish: simplify change_user_password for Supermicro

https://gerrit.wikimedia.org/r/1172265

This task needs a new Spicerack release, I hope to do one during the next couple of days!

wiki_willy mentioned this in Unknown Object (Task).Jul 28 2025, 11:38 PM

Change #1173883 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] installserver: add preseed config for sretest2009

https://gerrit.wikimedia.org/r/1173883

Change #1173883 merged by Elukey:

[operations/puppet@production] installserver: add preseed config for sretest2009

https://gerrit.wikimedia.org/r/1173883

I was able to reimage the host:

elukey@sretest2009:~$ df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                   63G     0   63G   0% /dev
tmpfs                  13G  2.0M   13G   1% /run
/dev/mapper/vg0-root   73G  2.8G   67G   5% /
tmpfs                  63G     0   63G   0% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/mapper/vg0-srv   629G   28K  597G   1% /srv
/dev/sdb2             241M  166K  241M   1% /boot/efi
tmpfs                  13G     0   13G   0% /run/user/13926

elukey@sretest2009:~$ sudo fdisk -l 
Disk /dev/sda: 894.25 GiB, 960197124096 bytes, 1875385008 sectors
Disk model: INTEL SSDSC2KB96
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: AF51B8C0-CADD-4C85-AC98-88333DD2931D

Device      Start        End    Sectors  Size Type
/dev/sda1    2048       4095       2048    1M BIOS boot
/dev/sda2    4096     503807     499712  244M EFI System
/dev/sda3  503808 1875384319 1874880512  894G Linux RAID


Disk /dev/sdb: 894.25 GiB, 960197124096 bytes, 1875385008 sectors
Disk model: INTEL SSDSC2KB96
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: B05822DC-0E08-4C78-9467-1E31D878313B

Device      Start        End    Sectors  Size Type
/dev/sdb1    2048       4095       2048    1M BIOS boot
/dev/sdb2    4096     503807     499712  244M EFI System
/dev/sdb3  503808 1875384319 1874880512  894G Linux RAID


Disk /dev/md0: 893.89 GiB, 959803555840 bytes, 1874616320 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/mapper/vg0-swap: 976 MiB, 1023410176 bytes, 1998848 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/mapper/vg0-root: 74.5 GiB, 79997960192 bytes, 156246016 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes


Disk /dev/mapper/vg0-srv: 639.64 GiB, 686813085696 bytes, 1341431808 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

@Jhancock.wm the host is ready for a check but please note that provision and reiamge are still not working, I am using custom versions that will hopefully be merged this week!

Thank you so much for your help on these! everything looks great. This one is a 1 CPU config F. It'll be an alternative to the CP servers. So once everything is finalized I can send it to that team for further testing.

@Jhancock.wm yep this should be ready for a review, already reimaged!