TZ: UTC +1/+2
User Details
- User Since
- Sep 1 2016, 6:48 AM (502 w, 2 d)
- Availability
- Available
- IRC Nick
- marostegui
- LDAP User
- Marostegui
- MediaWiki User
- MArostegui (WMF) [ Global Accounts ]
Yesterday
Thanks for the heads-up!
Fri, Apr 17
@Raine this deletion can happen anytime or you still have to do things on your end?
Right, given that it is everywhere, I'll simply use sX.dblist and issue a DROP TABLE IF EXISTS categorylinks_icu72
Rebuilding:
Raw Size: 1.746 TB [0xdf8fe2b0 Sectors] Firmware state: =====> Rebuild <=====
Thank you!
I think it may take around 5-6 hours. But to be on the safe side in case of issues, I'd suggest we announce a possible 24h times degradation stating that it is likely to take way less.
I'd also be doing one at the time, so we reduce the impact to one section at the time.
Let me know what you think
Any used disk that we can use for this host?
Thanks!
Thu, Apr 16
I can do that but I'll have to stop that host to clone the new one, would that be okay for the service?
This is ready for DC-Ops
This has been fixed upstream and will be released on 10.11.17 https://jira.mariadb.org/browse/MDEV-38072
https://github.com/MariaDB/server/pull/4928
Tue, Apr 14
Mon, Apr 13
Thanks! If you are happy with it, we can close this task.
sounds good from my side yes!
Thanks John for trying swapping many parts - unfortunately it didn't work so I am going to close this task and open a new one to decommission this host.
This host is dead.
Wilco! Thanks!
Blocked on finishing dborch1003 and the completion of this task: T317179
@ssingh we'd need support from your team to move dborch1003 behind the CDN - is it possible to allocate some time for this with @FCeratto-WMF? Thanks!
@jcrespo how do you feel about getting db2160 and db1217 migrated to Trixie? MariaDB version won't change.
The issue is that these hosts have the backups for the rest of sections (which are not yet scheduled to get migrated)
Thank you - let me know if I can help
Fri, Apr 10
Great!
I'm on call next week so we can force a lag page
@FCeratto-WMF has the parent's schema change progressed in any section that'd require recreating the views here?
@Ladsgroup what's the status of this? Should I start scheduling switchovers?
@Jclark-ctr I am not able to reimage the host, it is not rebooting, can you check onsite what's on the screen? I've tried several times to reboot it manually but there's no output at all.
Thu, Apr 9
Thanks John - let me reimage it now
Thank you @Jclark-ctr - let us know when we can take over.
Thankfully its replacement will arrive soon (famous last words) (T405296)
ops-eqiad can you check on site? The above errors seem HW related.
------------------------------------------------------------------------------- Record: 8 Date/Time: 04/09/2026 12:05:09 Source: system Severity: Critical Description: CPU 2 MEM345 VDDQ PG voltage is outside of range. ------------------------------------------------------------------------------- Record: 9 Date/Time: 04/09/2026 12:06:18 Source: system Severity: Critical Description: The system board Pfault fail-safe voltage is outside of range. -------------------------------------------------------------------------------
We can probably start by running a pt-show-grants on each host, compare them and make sure they are all the same. Once done that, we can start fixing redundant stuff and escaping things.
One thing I've always noticed was:
GRANT ALL PRIVILEGES ON '%\_p'.* already covers every _p database, making these three specific ones redundant:
- heartbeat\_p — subsumed by %\_p
- meta\_p — subsumed by %\_p
- centralauth\_p — subsumed by %\_p
- %wik%\_p — also subsumed by %\_p
This is basically all done and waiting for the next HW to arrive (see subtasks) to get installed directly as Trixie
That works for me @jcrespo - thank you.
Also keep in mind those hosts aren't blocking us in anyway as the version is the same.
Thanks @fnegri all clear now. I will check the task and the patch and see if I can think of something.
Thank you!
Not sure what this is for now, so protecting
If possible clouddb1015 would be nice T422777
@fnegri can we resume this and reimage one host with Trixie?
Wed, Apr 8
I can take a look from the DB side of things, but remember that being active-active we make no changes at all on the DB layer during switchovers (apart from circular replication, which only affects master for a few days).
Done and repooled
Tue, Apr 7
I think this is fixed. The new patched version (10.11.16u3) has been uploaded to bookworm and trixie repos. For more tracking of the bug itself: https://jira.mariadb.org/browse/MDEV-39209
@fnegri you may continue your updates
Works for me!