User Details
- User Since
- Apr 1 2015, 4:33 PM (557 w, 1 d)
- Availability
- Available
- LDAP User
- Moritz Mühlenhoff
- MediaWiki User
- MMuhlenhoff (WMF) [ Global Accounts ]
Today
It's depooled and monitoring disabled, you can replace any time
ops-limited is very broad access, it grants access to any of our 2400 server, including some very sensitive ones. But if this access is an ongoing need, we can surely create a new group kafka-jumbo-access, which grants you shell access to the Kafka Jumbo nodes.
The initial imposm catchup sync after the PBF import has just completed.
Yesterday
Thanks, I'll rebuild the software RAID tomorrow
The server has a defective memory stick, adding DC ops to get it deplaced:
Tue, Dec 2
RAID is rebuilt, resolving.
Yes, please. I'll take care of the software RAID rebuild.
This has been implemented. For installing a new server partman/custom/db-efi.cfg needs to be selected.
Mon, Dec 1
@Jhancock.wm The broken disk is /dev/sda which per lshw has the serial 22353BB15C0C, does that help?
Updates have been rolled out and diffs are being sent again.
Also reported to Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1121730
Rancid is a bit of a maze of scripts calling each other, but I could eventually track it down to /usr/bin/control_rancid. In our case, the key provided by "system tls server-profile self-signed key" is what made it too long eventaully for Exim to accept.
Fri, Nov 28
Thu, Nov 27
Access has been granted via Wikimedia IDM on Nov. 26, 2025, 4:38 p.m. Marking this task as resolved
I think all the relevant cookbooks have this enabled now. I'm resolving the task.
All done
These are all done, the remaining Ganeti nodes w/o AAAA records were decommissioned as part of the last hardware refreshes in eqiad and codfw.
This got fixed as part of the migration to Bookworm.
@Jclark-ctr : You should now be able to run smartctl, let me know if you run into any issues
Wed, Nov 26
@RKemper There's still an a missed host: cirrussearch2084 is marked as fixed, but on and old kernel and has an uptime of 215 days
This is complete. Luca and myself made a total of 122 commits to puppet.git (plus surely a few where me missed to tag the task) for:
Copied the output of dmesg to this paste in case it's needed for the warranty case: https://phabricator.wikimedia.org/P85732
dmesg is full of I/O errors for dev/sdb, we should definitely get that drive replaced.
FWIW, the plan for eqiad sounds good to me
Tue, Nov 25
I'll have a look later the day
Mon, Nov 24
The SuperMicro hosts are somewhat special, for the Dells the following cookbook should handle the reprovision to UEFI mode:
This broke Puppet runs on the puppetservers:
You can simply confirm and continue, Puppet 7 is already enabled for wdqs1031 via the insetup::data_platform_ferm role in site.pp
There's a few more trixie cloud nodes, I took the liberty to expand the task accordingly.
Fri, Nov 21
All done
@Volans made a copy of old the Spicerack/Cumin logs , they are available in /var/log/cumin100[12] on cumin1003 in case anyone needs them.
Wed, Nov 19
@ssingh The hcaptcha-proxy VMs in magru are up and running
Tue, Nov 18
Specs look good
