Page MenuHomePhabricator

root user not on newest batches of supermicro servers.
Closed, ResolvedPublic

Description

When running the provisioning script on these tasks (with one exception) the script hits a roadblock. At the point where it is trying to change the password for the root user, the user is not present in the servers. At this point we must log in to each individual server using the ADMIN login. This password is usually the default we requested (except in the aforementioned exception). Then the root user must be created in the BMC web interface.

(dcops, please add any additional servers with this issue you find to this list until it is resolved. I expcet the rest of the SM servers we ordered this quarter might have this same issue.)

T405964
T406795
T405406
T406796
T405966
T407032 < password not calvin, but the one on the luggage tag.

Event Timeline

Me and @Jhancock.wm were trying to figure this out over IRC. The provision cookbook says it’s trying ADMIN / calvin, and that combo is definitely correct We can log into the BMC web GUI with ADMIN / calvin just fine.

Looking at the code in _try_bmc_password, the first IF branch changes bmc_username from ADMIN to root:

if label == "wmf_root_mgmt":

bmc_username = "root"
bmc_password = password

That value never gets reset, so by the time we hit the "calvin" case and fall into the else branch, it looks like it’s actually trying root / calvin instead of ADMIN / calvin.

It ends up trying root / <BMC label> when it should be using ADMIN / <BMC label>.

I could be wrong and if someone could look at it also @elukey @Papaul
https://gerrit.wikimedia.org/r/c/operations/cookbooks/+/1170085/21/cookbooks/sre/hosts/provision.py#660
here is when it was added

This comment was removed by Jclark-ctr.

i think this would be the fix it will reset to ADMIN in each statement. except where it calls in the if statement to use "root"

try:
    username = bmc_username

    logger.info(
        "Connecting to the BMC as user %s, with password %s",
        username, label
    )
elukey triaged this task as Medium priority.Dec 15 2025, 3:19 PM

@Jclark-ctr thanks a lot! I am going to review it tomorrow, let me know if there is a server that I can test it with (but I guess you already manually fixed the rest, sorry). Sorry for the late reply but last week has been heavy on-call :(

@elukey no problem. I didn’t make any edits to the provisioning script it was just my observation. After reviewing the script, I believe that’s what was causing the error.
I know @VRiley-WMF has a few Supermicro servers that came in last week and still need to be racked.

T408760 Is the ticket that is pending to be racked at eqiad

Change #1218315 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/cookbooks@master] sre.hosts.provision: fix retry logic for the Supermicro BMC password

https://gerrit.wikimedia.org/r/1218315

@Jclark-ctr @VRiley-WMF I filed a patch to test, please let me know when you have a new wikikube worker configured in netbox and ready to be provisioned!

Change #1218315 merged by Elukey:

[operations/cookbooks@master] sre.hosts.provision: fix retry logic for the Supermicro BMC password

https://gerrit.wikimedia.org/r/1218315

@VRiley-WMF @Jclark-ctr the new code is merged, so you can test it once you have servers ready (I don't want to rush you). Please report if it works or not :)

Jclark-ctr claimed this task.

The servers VRiley had were not at a point where they could be tested. T408749 arrived today, and I was able to rack and cable it and work with Luca to verify the patch. The error I noticed, which Luca corrected, resolved the issue with usernames.