Since db1238 is having some HW errors logged, and has shown strange trends, let's prepare a different candidate master.
New candidate master: db1244
Since db1238 is having some HW errors logged, and has shown strange trends, let's prepare a different candidate master.
New candidate master: db1244
| Subject | Repo | Branch | Lines +/- | |
|---|---|---|---|---|
| db1244: Make it candidate master | operations/puppet | production | +2 -2 |
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Security | Ladsgroup | T370304 Bursts of occasional severe contention on s4 (commonswiki) primary mariadb causing recurrent user-facing outages on all wikis | ||
| Resolved | VRiley-WMF | T371342 db1238 bus critical errors | |||
| Resolved | Marostegui | T371343 Prepare new candidate master for s4 |
Right now db1238 is being recloned from db1244. db1244 is in a different row and has a clean history, so I'll probably pick that one.
Change #1058011 had a related patch set uploaded (by Marostegui; author: Marostegui):
[operations/puppet@production] db1244: Make it candidate master
Change #1058011 merged by Marostegui:
[operations/puppet@production] db1244: Make it candidate master
Mentioned in SAL (#wikimedia-operations) [2024-07-30T05:20:21Z] <marostegui> Change candidate master in s4 eqiad (this is a NOOP) T371343
db1238 recloned, db1244 set up as candidate master. Both hosts are getting slowly automatically repooled.
db1238 HW issues can be tracked at T371342: db1238 bus critical errors
db1244 is in the same rack as candidate master of s6: https://fault-tolerance.toolforge.org/map?cluster=db-master-candidates
We can leave it as is, one extra is not that bad.
Yeah, we can change candidate master easily if needed. Let's leave it like that for now, we still don't know if s4 issue is fixed anyway.