Page MenuHomePhabricator

labsdb1009 crashed while doing an alter table on templatelinks
Closed, ResolvedPublic

Description

Creating this for the record.
labsdb1009 crashed while attempting to do the massive alters for: T166204

InnoDB: ###### Diagnostic info printed to the standard error stream
InnoDB: Error: semaphore wait has lasted > 600 seconds
InnoDB: We intentionally crash the server, because it appears to be hung.
2017-07-13 22:21:07 7eefc3bfd700  InnoDB: Assertion failure in thread 139568246413056 in file srv0srv.cc line 2418
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
170713 22:21:07 [ERROR] mysqld got signal 6 ;

It crashed when alterting templatelinks (61G).
labsdb1011 finished the alters with no issues a couple of days ago, so I have given labsdb1009 another go to see how it goes this time.
Replication didn't get broken or anything upon restart.

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

I will leave this open until the alters are done

Should we copy 1009 from 1010?

Let's wait and see if the alters finish fine this time I would say.
The server recovered fine after the crash, replication had no issues or anything

templatelinks went thru finely, now pagelinks is being altered (121G)

Mentioned in SAL (#wikimedia-operations) [2017-07-17T05:09:29Z] <marostegui> Restart MySQL on labsdb1009 for maintenance - T170657

Marostegui claimed this task.

The alters finished correctly, I have also stopped the server and started it without any issues, so I am going to consider this resolved as a punctual case.