Page MenuHomePhabricator

Slow query bringing down s7
Closed, ResolvedPublicSecurity

Description

Setting this as security protected as this could be used as a DDoS.
The following query was run many time on s7 replicas:

root@db1174.eqiad.wmnet[centralauth]> explain  SELECT /* IndexPager::buildQueryInfo (MediaWiki\Extension\CentralAuth\Special\GlobalUsersPager)  */  gu_name,gu_id,MAX(gu_locked) AS `gu_locked`,MAX(lu_attached_method) AS `lu_attached_method`,(SELECT  GROUP_CONCAT(G.gug_group SEPARATOR '|')  FROM `global_user_groups` `G`    WHERE (G.gug_user = gu_id) AND (G.gug_expiry IS NULL OR G.gug_expiry >= '20220331051842')  ) AS `gug_group`,(SELECT  GROUP_CONCAT(IFNULL(G.gug_expiry, "null") SEPARATOR '|')  FROM `global_user_groups` `G`    WHERE (G.gug_user = gu_id) AND (G.gug_expiry IS NULL OR G.gug_expiry >= '20220331051842')  ) AS `gug_expiry`  FROM `globaluser` LEFT JOIN `localuser` ON ((gu_name = lu_name) AND lu_wiki = 'metawiki')   WHERE gu_hidden_level = 0  GROUP BY gu_name, gu_id ORDER BY gu_name DESC LIMIT 51;
+------+--------------------+------------+--------+--------------------+---------+---------+--------------------------------------+----------+----------------------------------------------+
| id   | select_type        | table      | type   | possible_keys      | key     | key_len | ref                                  | rows     | Extra                                        |
+------+--------------------+------------+--------+--------------------+---------+---------+--------------------------------------+----------+----------------------------------------------+
|    1 | PRIMARY            | globaluser | ALL    | NULL               | NULL    | NULL    | NULL                                 | 64813976 | Using where; Using temporary; Using filesort |
|    1 | PRIMARY            | localuser  | eq_ref | PRIMARY,lu_name    | PRIMARY | 514     | const,centralauth.globaluser.gu_name | 1        | Using where                                  |
|    3 | DEPENDENT SUBQUERY | G          | ref    | PRIMARY,gug_expiry | PRIMARY | 4       | func                                 | 1        | Using where                                  |
|    2 | DEPENDENT SUBQUERY | G          | ref    | PRIMARY,gug_expiry | PRIMARY | 4       | func                                 | 1        | Using where                                  |
+------+--------------------+------------+--------+--------------------+---------+---------+--------------------------------------+----------+----------------------------------------------+
4 rows in set (0.001 sec)

There were two problems:

  1. the query is very slow (full scan - takes more than 10 minutes to run)
  2. it had hundreds of threads spawned

Details

Risk Rating
Low
Author Affiliation
WMF Technology Dept

Event Timeline

Forgot to mention that the work around was to kill it everywhere.

I don't have more time to look until later today/tomorrow, but this is a possible cause included in this weeks train: https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/extensions/CentralAuth/+/3601f29707f77c1eaa0f77edfe5b25607baa97c5%5E%21/#F0

And indeed the query plan for queries generated by the old code look much better:

wikiadmin@10.64.48.56(centralauth)> explain  SELECT /* IndexPager::buildQueryInfo (MediaWiki\Extension\CentralAuth\Special\GlobalUsersPager)  */  gu_name,MAX(gu_id) as `gu_id`,MAX(gu_locked) AS `gu_locked`,MAX(lu_attached_method) AS `lu_attached_method`,(SELECT  GROUP_CONCAT(G.gug_group SEPARATOR '|')  FROM `global_user_groups` `G`    WHERE (G.gug_user = gu_id) AND (G.gug_expiry IS NULL OR G.gug_expiry >= '20220331051842')  ) AS `gug_group`,(SELECT  GROUP_CONCAT(IFNULL(G.gug_expiry, "null") SEPARATOR '|')  FROM `global_user_groups` `G`    WHERE (G.gug_user = gu_id) AND (G.gug_expiry IS NULL OR G.gug_expiry >= '20220331051842')  ) AS `gug_expiry`  FROM `globaluser` LEFT JOIN `localuser` ON ((gu_name = lu_name) AND lu_wiki = 'metawiki')   WHERE gu_hidden_level = 0  GROUP BY gu_name ORDER BY gu_name DESC LIMIT 51;
+------+--------------------+------------+--------+--------------------+---------+---------+--------------------------------------+------+-------------+
| id   | select_type        | table      | type   | possible_keys      | key     | key_len | ref                                  | rows | Extra       |
+------+--------------------+------------+--------+--------------------+---------+---------+--------------------------------------+------+-------------+
|    1 | PRIMARY            | globaluser | index  | NULL               | gu_name | 258     | NULL                                 | 51   | Using where |
|    1 | PRIMARY            | localuser  | eq_ref | PRIMARY,lu_name    | PRIMARY | 514     | const,centralauth.globaluser.gu_name | 1    | Using where |
|    3 | DEPENDENT SUBQUERY | G          | ref    | PRIMARY,gug_expiry | PRIMARY | 4       | func                                 | 1    | Using where |
|    2 | DEPENDENT SUBQUERY | G          | ref    | PRIMARY,gug_expiry | PRIMARY | 4       | func                                 | 1    | Using where |
+------+--------------------+------------+--------+--------------------+---------+---------+--------------------------------------+------+-------------+
4 rows in set (0.001 sec)

Unless someone has a quick fix we can revert https://gerrit.wikimedia.org/r/c/mediawiki/core/+/773915 and https://gerrit.wikimedia.org/r/c/mediawiki/extensions/CentralAuth/+/773913

Let's please revert those if we cannot find a solution before the weekend.

Zabe closed this task as Resolved.EditedMar 31 2022, 1:29 PM
Zabe assigned this task to Lucas_Werkmeister_WMDE.

Patches got reverted. This can be made public now.

sbassett changed Author Affiliation from N/A to WMF Technology Dept.Mar 31 2022, 6:02 PM
sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".
sbassett changed the edit policy from "Custom Policy" to "All Users".
sbassett changed Risk Rating from N/A to Low.
sbassett moved this task from Incoming to Our Part Is Done on the Security-Team board.