Page MenuHomePhabricator

db1169 replication broken - enwiki.pagelinks corruption
Closed, ResolvedPublic

Description

20:56 <+icinga-wm> PROBLEM - MariaDB Replica SQL: s1 on db1169 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table pagelinks is corrupt: try to repair it on query. Default database: enwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica
21:04 PROBLEM - MariaDB Replica Lag: s1 on db1169 is CRITICAL: CRITICAL slave_sql_lag Replication lag: 656.04 seconds https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica

I depooled it since it was causing issues with bots due to maxlag going up: P70590

Event Timeline

Mentioned in SAL (#wikimedia-operations) [2024-10-28T06:06:55Z] <taavi@cumin1002> START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320

Mentioned in SAL (#wikimedia-operations) [2024-10-28T06:07:09Z] <taavi@cumin1002> END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: replication broken T378320

ABran-WMF triaged this task as High priority.
ABran-WMF moved this task from Triage to In progress on the DBA board.
ABran-WMF renamed this task from db1169 replication broken to db1169 replication broken - enwiki.pagelinks corruption.Oct 28 2024, 9:54 AM

index rebuilt: Query OK, 0 rows affected (2 hours 44 min 43.595 sec)
replication catching back up

Start pool of db1169 gradually with 4 steps - index rebuilt - arnaudb@cumin1002

Start pool of db1169 gradually with 4 steps - index rebuilt - arnaudb@cumin1002

Start pool of db1169 quickly with 2 steps - index rebuilt - arnaudb@cumin1002

Mentioned in SAL (#wikimedia-operations) [2024-10-29T08:09:52Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: T378320', diff saved to https://phabricator.wikimedia.org/P70599 and previous config saved to /var/cache/conftool/dbconfig/20241029-080951-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-10-29T08:24:57Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: T378320', diff saved to https://phabricator.wikimedia.org/P70600 and previous config saved to /var/cache/conftool/dbconfig/20241029-082456-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-10-29T08:40:06Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: T378320', diff saved to https://phabricator.wikimedia.org/P70603 and previous config saved to /var/cache/conftool/dbconfig/20241029-084002-arnaudb.json

Mentioned in SAL (#wikimedia-operations) [2024-10-29T08:55:08Z] <arnaudb@cumin1002> dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: T378320', diff saved to https://phabricator.wikimedia.org/P70604 and previous config saved to /var/cache/conftool/dbconfig/20241029-085507-arnaudb.json