This comes after T195520 and talks with people and investigation.
In #wikimedia-operations 28/05/2018
9:51 AM <addshore> marostegui: another thought was, is there a way to force certain queries to have a shorter timeout that whatever is configured on the db hosts?
9:52 AM <marostegui> Manuel Arostegui addshore: we have a general query killer that kills queries that take more than 60 seconds in production, the problem is that it gets overloaded and cannot kill as fast as possible
9:52 AM addshore: It should be done from the code side
9:52 AM <addshore> ack
Both MySQL and MariaDB currently allow per query timeouts to be passed to the server along with the query.
Since 10.1.1 for mariadb and 5.7.45 for mysql.
In WMF production we appear to be using mariadb 10.1.32 (according to db1109) which would allow us to use this function.
Example use on the command line:
mysql:wikiadmin@db1109 [wikidatawiki]> SET STATEMENT max_statement_time=1 FOR select count(*) from wbc_entity_usage; ERROR 1969 (70100): Query execution was interrupted (max_statement_time exceeded) mysql:wikiadmin@db1109 [wikidatawiki]> Ctrl-C -- exit!
Adding this to the mediawiki DB abstraction would allow mediawiki and extensions to define optional per query timeouts.
Relating to T195520 this could be used with selects to the wb_terms table and would probably stop any sort of change in the query execution plan from killing the whole shard and causing various issues on other sites.
This would probably be implemented as a new option for example in $options for the IDB::select method.
Is this something that DBA s would approve of?
Is limiting queries in this way 'allowed'?
I also saw somewhere that either mysql or mariadb has a similar option for max number of scanned rows. This could possibly also be added to the abstraction but would likely not be as valuable as a general timeout.