Starting from ~13h UTC today (26 September), phabricator search started not responding, and later the website responded with error messages of phabricator cannot connect to mysql.
Upon investigation, we found that there was table contention:
https://grafana-admin.wikimedia.org/dashboard/db/mysql?panelId=19&fullscreen&from=1474876675832&to=1474899475832&var-dc=eqiad%20prometheus%2Fops&var-server=db1048
This was caused by lots of queries waiting for full table lock on search_documentfield, such as:
Waiting for table level lock Info: SELECT document.phid FROM `search_document` document JOIN `search_documentfield` field ON field.phid = document.phid JOIN `search_documentr WHERE MATCH(corpus) AGAINST ('mediawiki' IN BOOLEAN MODE) GROUP BY document.phid ORDER BY IF(documentType = 'USER', 0, 1) ASC, MAX(MATCH(corpus) AGAINST ('mediawiki')) DESC LIMIT 188, 1
The cause for the full table lock is because this table was in Aria (~MyISAM) format, in which writes lock the full table, and block reads. This caused search to not work, but after a while, brought phabricator down because pileup of these queries filled up the available connections, affecting all connections:
The cause is believed to be a large amount of DELETEs done on this table, which will cause issues because you cannot have a large amount of writes on a MyISAM tables (causes contention) because full table locks. This are the queries that we believe caused this issue, according to tendril:
Measures taken:
- MySQL was restarted- connections took too much time to get killed and we failoverd to a read only replica while maintenace was ongoing
- The table was converted to InnoDB, including the fulltext index- We are unaware of the consequeces of this for phabricator. We assume that the table was in Aria for a reason; We do not know if that will work; even if it will, it may need tuning:
CREATE TABLE `search_documentfield` ( `phid` varbinary(64) NOT NULL, `phidType` varchar(4) COLLATE utf8mb4_bin NOT NULL, `field` varchar(4) COLLATE utf8mb4_bin NOT NULL, `auxPHID` varbinary(64) DEFAULT NULL, `corpus` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci, KEY `phid` (`phid`), FULLTEXT KEY `corpus` (`corpus`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;
- As the table was truncated to be able to be converted; (a backup was taken beforhand), a reindexing process was initiated by chase; that may temporarily altered the workflow?
Followup with me to see how to permanently fix this, but that table had legitimate contention issues and probably search should be either on an external service or using InnoDB, not MyISAM.