Page MenuHomePhabricator

db1046 innodb signal 6 abort and restart
Closed, ResolvedPublic

Description

db1046 (m4 master, behind dbproxy1004) mysqld aborted:

2015-07-03 23:16:27 7f5a0fbff700  InnoDB: Assertion failure in thread 140024788023040 in file buf0flu.cc line 939
InnoDB: Failing assertion: page_zip_verify_checksum(frame, zip_size)
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
150703 23:16:27 [ERROR] mysqld got signal 6 ;
  • DBA: Figure it out. Compression related? But InnoDB compressed tables should have been replaced with TokuDB...
  • Analytics: Check out eventlogging consumer with analytics. The haproxy on dbproxy1004 redirected traffic to db1047, bu did the consumer like that? Do we (a DBA) need to sync events back to the master, or just do a backfill?

Event Timeline

Springle created this task.Jul 4 2015, 1:11 AM
Springle claimed this task.
Springle raised the priority of this task from to Needs Triage.
Springle updated the task description. (Show Details)
Springle added subscribers: Springle, jcrespo, mforns, Milimetric.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 4 2015, 1:11 AM
jcrespo moved this task from Triage to In progress on the DBA board.Jul 6 2015, 6:26 AM
jcrespo moved this task from In progress to Backlog on the DBA board.Sep 10 2015, 11:05 AM
jcrespo claimed this task.Oct 8 2015, 5:04 PM

Even if some lost events are not a huge problem for this schema, we should make sure that db1047 and db1046 contain the same data before doing a schema change on all tables to avoid having two inconsistent servers.

This must happen before T108850 and T108856.

The idea is setup the replication db1046 -> db1047, then backfill db1046 with new records only available on db1047.

Sorry, Jaime, I missed this problem when it happened. The project we all monitor is Analytics-Backlog, but we've been meaning to clean that up, there are too many confusing Analytics-* projects.

The idea is setup the replication db1046 -> db1047, then backfill db1046 with new records only available on db1047.

This plan sounds good to me. Even if we lost data back in July, it would be too late for anyone to take any action based on that, though, so if it's too hard I wouldn't worry about it.

jcrespo triaged this task as Low priority.Nov 6 2015, 4:55 PM
jcrespo closed this task as Resolved.Jan 25 2016, 7:38 PM

Most of the issues if not all have been fixed with T120187 or are now irrelevant.