Page MenuHomePhabricator

Use rc_timestamp index when joining to ores_classification
Closed, DeclinedPublic

Description

We worked out performance issue when hidenondamaging=1 before T137895: hidenondamaging is slow but it still a little slow (2 secs worst case possible).
https://gerrit.wikimedia.org/r/#/c/295528/ makes it much better.

See related task T146111: hidenondamaging=1 query is extremely slow on enwiki

Event Timeline

Old query:

EXPLAIN SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */ 
rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest, 
(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
oresc_probability,0.8 AS ores_threshold 
FROM `recentchanges` 
LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) 
LEFT JOIN `page` ON ((rc_cur_id=page_id)) 
INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160608000000') AND (rc_source != 'wb') AND 
((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8') 
AND rc_patrolled = '0' AND rc_new IN ('0','1') 
ORDER BY rc_timestamp DESC 
LIMIT 50\G

New query:

EXPLAIN SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */ 
rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest, 
(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
oresc_probability,0.8 AS ores_threshold 
FROM `recentchanges` 
LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) 
LEFT JOIN `page` ON ((rc_cur_id=page_id)) 
INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
INNER JOIN `revision` ON ((rc_timestamp = rev_timestamp  AND oresc_rev = rev_id))
WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160608000000') AND (rc_source != 'wb') AND 
((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8') 
AND rc_patrolled = '0' AND rc_new IN ('0','1') 
ORDER BY rc_timestamp DESC 
LIMIT 50\G

After checking the performance of both options, I can conclude that the second option, using the revision table is a bad idea, as it introduces a potential partial full scan of row, taking minutes to execute:

MariaDB  db1045 wikidatawiki > pager cat > /dev/null; FLUSH STATUS; SELECT   rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id, rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid, rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest,  (SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`, oresc_probability,0.8 AS ores_threshold  FROM `recentchanges`  LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))  LEFT JOIN `page` ON ((rc_cur_id=page_id))  INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1)) LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1)) WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160624000000') AND (rc_source != 'wb') AND  ((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8')  AND rc_patrolled = '0' AND rc_new IN ('0','1')  ORDER BY rc_timestamp DESC  LIMIT 50; nopager; SHOW STATUS like 'Hand%';
PAGER set to 'cat > /dev/null'
Query OK, 0 rows affected (0.01 sec)

50 rows in set (0.16 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1169  |
| Handler_read_last          | 0     |
| Handler_read_next          | 562   |
| Handler_read_prev          | 14910 |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)

MariaDB  db1045 wikidatawiki > pager cat > /dev/null; FLUSH STATUS; 
PAGER set to 'cat > /dev/null'
Query OK, 0 rows affected (0.00 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */ 
    -> rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
    -> rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
    -> rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest, 
    -> (SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
    -> oresc_probability,0.8 AS ores_threshold 
    -> FROM `recentchanges` 
    -> LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) 
    -> LEFT JOIN `page` ON ((rc_cur_id=page_id)) 
    -> INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
    -> LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
    -> INNER JOIN `revision` ON ((rc_timestamp = rev_timestamp  AND oresc_rev = rev_id))
    -> WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160624000000') AND (rc_source != 'wb') AND 
    -> ((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8') 
    -> AND rc_patrolled = '0' AND rc_new IN ('0','1') 
    -> ORDER BY rc_timestamp DESC 
    -> LIMIT 50;
50 rows in set (0.10 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SHOW STATUS like 'Hand%';
25 rows in set (0.00 sec)

MariaDB  db1045 wikidatawiki > pager cat > /dev/null; FLUSH STATUS; 
PAGER set to 'cat > /dev/null'
Query OK, 0 rows affected (0.00 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */ 
    -> rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
    -> rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
    -> rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest, 
    -> (SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
    -> oresc_probability,0.8 AS ores_threshold 
    -> FROM `recentchanges` 
    -> LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) 
    -> LEFT JOIN `page` ON ((rc_cur_id=page_id)) 
    -> INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
    -> LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
    -> INNER JOIN `revision` ON ((rc_timestamp = rev_timestamp  AND oresc_rev = rev_id))
    -> WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160624000000') AND (rc_source != 'wb') AND 
    -> ((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8') 
    -> AND rc_patrolled = '0' AND rc_new IN ('0','1') 
    -> ORDER BY rc_timestamp DESC 
    -> LIMIT 50;
50 rows in set (0.10 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > nopager;
PAGER set to stdout
MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SHOW STATUS like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1232  |
| Handler_read_last          | 0     |
| Handler_read_next          | 567   |
| Handler_read_prev          | 15042 |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.01 sec)

MariaDB  db1045 wikidatawiki > pager cat > /dev/null; FLUSH STATUS; SELECT   rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id, rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid, rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest,  (SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`, oresc_probability,0.8 AS ores_threshold  FROM `recentchanges`  LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))  LEFT JOIN `page` ON ((rc_cur_id=page_id))  INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1)) LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1)) WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160624000000') AND (rc_source != 'wb') AND  ((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8')  AND rc_patrolled = '0' AND rc_new IN ('0','1')  ORDER BY rc_timestamp DESC  LIMIT 50; nopager; SHOW STATUS like 'Hand%';
PAGER set to 'cat > /dev/null'
Query OK, 0 rows affected (0.00 sec)

50 rows in set (0.10 sec)

PAGER set to stdout
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1184  |
| Handler_read_last          | 0     |
| Handler_read_next          | 569   |
| Handler_read_prev          | 15101 |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)

MariaDB  db1045 wikidatawiki > pager cat > /dev/null; FLUSH STATUS; 
PAGER set to 'cat > /dev/null'
Query OK, 0 rows affected (0.00 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */ 
    -> rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
    -> rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
    -> rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest, 
    -> (SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
    -> oresc_probability,0.8 AS ores_threshold 
    -> FROM `recentchanges` 
    -> LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) 
    -> LEFT JOIN `page` ON ((rc_cur_id=page_id)) 
    -> INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
    -> LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
    -> INNER JOIN `revision` ON ((rc_timestamp = rev_timestamp  AND oresc_rev = rev_id))
    -> WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160601000000') AND (rc_source != 'wb') AND 
    -> ((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8') 
    -> AND rc_patrolled = '0' AND rc_new IN ('0','1') 
    -> ORDER BY rc_timestamp DESC 
    -> LIMIT 50;
50 rows in set (33.74 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > nopager;
PAGER set to stdout
MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SHOW STATUS like 'Hand%';
+----------------------------+---------+
| Variable_name              | Value   |
+----------------------------+---------+
| Handler_commit             | 1       |
| Handler_delete             | 0       |
| Handler_discover           | 0       |
| Handler_external_lock      | 0       |
| Handler_icp_attempts       | 33831   |
| Handler_icp_match          | 33831   |
| Handler_mrr_init           | 0       |
| Handler_mrr_key_refills    | 0       |
| Handler_mrr_rowid_refills  | 0       |
| Handler_prepare            | 0       |
| Handler_read_first         | 0       |
| Handler_read_key           | 19934   |
| Handler_read_last          | 0       |
| Handler_read_next          | 35100   |
| Handler_read_prev          | 0       |
| Handler_read_rnd           | 50      |
| Handler_read_rnd_deleted   | 0       |
| Handler_read_rnd_next      | 1235842 |
| Handler_rollback           | 0       |
| Handler_savepoint          | 0       |
| Handler_savepoint_rollback | 0       |
| Handler_tmp_update         | 0       |
| Handler_tmp_write          | 3541    |
| Handler_update             | 0       |
| Handler_write              | 0       |
+----------------------------+---------+
25 rows in set (0.00 sec)



MariaDB  db1045 wikidatawiki > pager cat > /dev/null; FLUSH STATUS; 
PAGER set to 'cat > /dev/null'
Query OK, 0 rows affected (0.00 sec)

MariaDB  db1045 wikidatawiki > 
MariaDB  db1045 wikidatawiki > SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */ 
    -> rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
    -> rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
    -> rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest, 
    -> (SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
    -> oresc_probability,0.8 AS ores_threshold 
    -> FROM `recentchanges` 
    -> LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace)) 
    -> LEFT JOIN `page` ON ((rc_cur_id=page_id)) 
    -> INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
    -> LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
    -> WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160601000000') AND (rc_source != 'wb') AND 
    -> ((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8') 
    -> AND rc_patrolled = '0' AND rc_new IN ('0','1') 
    -> ORDER BY rc_timestamp DESC 
    -> LIMIT 50\G
50 rows in set (0.10 sec)

MariaDB  db1045 wikidatawiki > SHOW STATUS like 'Hand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Handler_commit             | 1     |
| Handler_delete             | 0     |
| Handler_discover           | 0     |
| Handler_external_lock      | 0     |
| Handler_icp_attempts       | 0     |
| Handler_icp_match          | 0     |
| Handler_mrr_init           | 0     |
| Handler_mrr_key_refills    | 0     |
| Handler_mrr_rowid_refills  | 0     |
| Handler_prepare            | 0     |
| Handler_read_first         | 0     |
| Handler_read_key           | 1196  |
| Handler_read_last          | 0     |
| Handler_read_next          | 575   |
| Handler_read_prev          | 15270 |
| Handler_read_rnd           | 0     |
| Handler_read_rnd_deleted   | 0     |
| Handler_read_rnd_next      | 0     |
| Handler_rollback           | 0     |
| Handler_savepoint          | 0     |
| Handler_savepoint_rollback | 0     |
| Handler_tmp_update         | 0     |
| Handler_tmp_write          | 0     |
| Handler_update             | 0     |
| Handler_write              | 0     |
+----------------------------+-------+
25 rows in set (0.00 sec)

In general, please avoid touching the revision table except for individual (point SELECTs) queries.

@jcrespo it seems to me that The Roight Thing (tm) would be to add an index to rc_this_oldid. Looking up RC entries by their revision ID is something we need in several places. How hard would that be? Is there already a task for it?

rc_this_oldid which table?

recentchanges

The table needs some clean up, this is enwiki master:

PRIMARY KEY (`rc_id`),
KEY `rc_timestamp` (`rc_timestamp`),
KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
KEY `rc_cur_id` (`rc_cur_id`),
KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
KEY `rc_ip` (`rc_ip`),
KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`),
KEY `tmp_1` (`rc_this_oldid`)

Assuming this is fixed/mitigated: T132416 and the change (being core) has consensus of all developers (please seek it), the actual implementation, thanks to MariaDB 10 is easy-ish - around a week of work with little to no impact once everything is prepared.

Halfak triaged this task as Medium priority.Jul 11 2016, 5:08 PM
Halfak renamed this task from Make hidenondamaging=1 faster to Use rc_timestamp index when joining to ores_classification.Sep 22 2016, 2:42 PM
Halfak updated the task description. (Show Details)
Halfak updated the task description. (Show Details)

@Ladsgroup It looks like this task should be renamed or invalidated, now that the solution is to use the rc_this_oldid index?

Seeing it now, basically getting T139012 done makes this obsolete.