Page MenuHomePhabricator

db2124 depooled with index corruption
Closed, ResolvedPublic

Description

Hi DBA,

22:07:45 <+icinga-wm> PROBLEM - MariaDB Replica SQL: s6 #page on db2124 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table page_props is corrupt: try to repair it on query. Default database: frwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica

I've depooled it at 22:10, and I'm about to downtime (for five days, through this time Tuesday, just in case SRE Summit travel means it takes longer to look at).

Event Timeline

Icinga downtime and Alertmanager silence (ID=17885a36-8547-4a13-afea-8d73c87e272d) set by rzl@cumin2002 for 5 days, 0:00:00 on 1 host(s) and their services with reason: index corruption

db2124.codfw.wmnet

I don't see anything obviously hardware-broken in logs. I notice it was just repooled yesterday after maintenance for T352010, but nothing jumps out as an obvious cause. Over to the DBAs from here, enjoy. :)

Marostegui subscribed.

I could probably fix it right away, but I think I am going to quickly reclone it instead

Change 1009638 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2124: Disable notifications

https://gerrit.wikimedia.org/r/1009638

Change 1009638 merged by Marostegui:

[operations/puppet@production] db2124: Disable notifications

https://gerrit.wikimedia.org/r/1009638

Host recloned and being slowly repooled.
Thanks @RLazarus for addressing this incident!