Page MenuHomePhabricator

db2124 depooled with index corruption
Closed, ResolvedPublic

Description

Hi DBA,

22:07:45 <+icinga-wm> PROBLEM - MariaDB Replica SQL: s6 #page on db2124 is CRITICAL: CRITICAL slave_sql_state Slave_SQL_Running: No, Errno: 1034, Errmsg: Error Index for table page_props is corrupt: try to repair it on query. Default database: frwiki. [Query snipped] https://wikitech.wikimedia.org/wiki/MariaDB/troubleshooting%23Depooling_a_replica

I've depooled it at 22:10, and I'm about to downtime (for five days, through this time Tuesday, just in case SRE Summit travel means it takes longer to look at).

Details

Related Changes in Gerrit:

Event Timeline

Icinga downtime and Alertmanager silence (ID=17885a36-8547-4a13-afea-8d73c87e272d) set by rzl@cumin2002 for 5 days, 0:00:00 on 1 host(s) and their services with reason: index corruption

db2124.codfw.wmnet

I don't see anything obviously hardware-broken in logs. I notice it was just repooled yesterday after maintenance for T352010, but nothing jumps out as an obvious cause. Over to the DBAs from here, enjoy. :)

Marostegui subscribed.

I could probably fix it right away, but I think I am going to quickly reclone it instead

Change 1009638 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2124: Disable notifications

https://gerrit.wikimedia.org/r/1009638

Change 1009638 merged by Marostegui:

[operations/puppet@production] db2124: Disable notifications

https://gerrit.wikimedia.org/r/1009638

Host recloned and being slowly repooled.
Thanks @RLazarus for addressing this incident!