In order to avoid further crashes with the same table, we should alter user_groups table across all the wikis to conver it to InnoDB.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Marostegui | T145077 mysqld process hang in db1069 - S2 mysql instance | |||
Resolved | Marostegui | T146121 db1069: convert user_groups table to InnoDB across all the wikis |
Event Timeline
root@db1069:/srv# find . -name user_groups.frm | awk -F "." '{print $3}' | awk -F "/" '{print $1}' | uniq -c | sort -k2 1 s1 17 s2 830 s3 1 s4 2 s5 3 s6 12 s7
@jcrespo I assume we want these tables converted to InnoDB across all the shards, right? And also, replicate that alter downstream.
Yes, ideally everything would be on InnoDB, we probably can only do it on a subset of tables for now. These should be ok in size (they should be small), but check available disk space both on db1069 and labsdb100[13]. Also check replication filtering is not broken in the process (it should not be for a simple engine change). Just be careful about: the replication filtering; the triggers on sanitarium; the existing views on labs.
Do not worry about lag, the conversion should be fast and user queries usually create worse issues. Just do plain alters and let them replicate.
So far I have only converted:
S1: enwiki/user_groups
It all looked fine but I do not want to do more tables at the same time at the end of the week, just in case we find something weird and given that we will be on an offsite next week.
The table is 12M so disk space shouldn't be an issue if they are all around that size
I think user_groups on s3 failed today:
*************************** 8. row *************************** Id: 3472432 User: system user Host: db: urwiki Command: Connect Time: 11074 State: update Info: INSERT /* User::addGroup */ IGNORE INTO `user_groups` (ug_user,ug_group) VALUES ('XXXXX Progress: 0.000
I have just converted S2 user_group tables to InnoDB.
Note: Percona has not replied yet to the bug report after I sent the stacktraces 9 days ago.
Mentioned in SAL (#wikimedia-operations) [2016-10-03T06:30:21Z] <marostegui> altering S3,S4,S5,S6,S7 user_groups tables in sanitarium to avoid tokudb bug - T146121
S7 tables converted
All the user_groups tables across all the shards are now running InnoDB engine instead of tokudb.
Will be difficult to provide further stacktraces to the open Percona/MariaDB bug (https://phabricator.wikimedia.org/T145077#2651434) as this is likely to be resolved by converting all to InnoDB.