It seems hidenondamaging in wikidatawiki is slow probably because there is no index on oresc_probability
Description
Details
Related Objects
Event Timeline
This is the query that activating ORES does, taking between 1 minute and 20 seconds to execute, including a DEPENDENT SUBQUERY and reading 5 million rows:
MariaDB db1026 wikidatawiki > EXPLAIN SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */
rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest,
(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
oresc_probability,0.8 AS ores_threshold
FROM `recentchanges`
LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))
LEFT JOIN `page` ON ((rc_cur_id=page_id))
LEFT JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160608000000') AND (rc_source != 'wb') AND
((oresm_name = 'damaging' OR oresm_name IS NULL)) AND oresc_is_predicted = '1' AND (oresc_probability > '0.8')
AND rc_patrolled = '0' AND rc_new IN ('0','1')
ORDER BY rc_timestamp DESC
LIMIT 50\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: recentchanges
type: ref
possible_keys: rc_timestamp,new_name_timestamp,tmp_2
key: tmp_2
key_len: 1
ref: const
rows: 5700598
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: watchlist
type: eq_ref
possible_keys: wl_user,namespace_title,wl_user_notificationtimestamp
key: wl_user
key_len: 265
ref: const,wikidatawiki.recentchanges.rc_namespace,wikidatawiki.recentchanges.rc_title
rows: 1
Extra:
*************************** 3. row ***************************
id: 1
select_type: PRIMARY
table: ores_classification
type: ref
possible_keys: oresc_rev,oresc_winner
key: oresc_rev
key_len: 4
ref: wikidatawiki.recentchanges.rc_this_oldid
rows: 1
Extra: Using where
*************************** 4. row ***************************
id: 1
select_type: PRIMARY
table: ores_model
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 2
ref: wikidatawiki.ores_classification.oresc_model
rows: 1
Extra: Using where
*************************** 5. row ***************************
id: 1
select_type: PRIMARY
table: page
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: wikidatawiki.recentchanges.rc_cur_id
rows: 1
Extra:
*************************** 6. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: change_tag
type: ref
possible_keys: change_tag_rc_tag
key: change_tag_rc_tag
key_len: 5
ref: wikidatawiki.recentchanges.rc_id
rows: 1
Extra: Using indexCreating a better query would be a blocker to allowing the deploying of this functionality. Your aim should be on a few dozens of miliseconds, not seconds. I would recommend you checking the code of the other recentchanges filters.
Change 294693 had a related patch set uploaded (by Ladsgroup):
Performance boos on hidenondamaging
By using INNER JOIN instead of LEFT JOIN in case of hidenondamaging, everything would tons and tons of times faster.
Change 294712 had a related patch set uploaded (by Ladsgroup):
Performance boost on hidenondamaging
Can you copy me the potential SQL queries generated by your code so I can run EXPLAIN on them on production to test them? Thank you.
EXPLAIN SELECT /* SpecialRecentChanges::doMainQuery Ladsgroup */
rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,
rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,
rc_log_type,rc_log_action,rc_params,wl_user,wl_notificationtimestamp,page_latest,
(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,
oresc_probability,0.8 AS ores_threshold
FROM `recentchanges`
LEFT JOIN `watchlist` ON (wl_user = '28859' AND (wl_title=rc_title) AND (wl_namespace=rc_namespace))
LEFT JOIN `page` ON ((rc_cur_id=page_id))
INNER JOIN `ores_classification` ON ((rc_this_oldid = oresc_rev AND oresc_is_predicted = 1 AND oresc_class = 1))
LEFT JOIN `ores_model` ON ((oresc_model = oresm_id AND oresm_is_current = 1))
WHERE rc_bot = '0' AND (rc_type != '6') AND (rc_timestamp >= '20160608000000') AND (rc_source != 'wb') AND
((oresm_name = 'damaging' OR oresm_name IS NULL)) AND (oresc_probability > '0.8')
AND rc_patrolled = '0' AND rc_new IN ('0','1')
ORDER BY rc_timestamp DESC
LIMIT 50\GThat's the query you copied but with my changes (using INNER JOIN instead of LEFT JOIN)
I think the changes look good, 30K-100K rows visited using an index. It could be better, but I think for that I will have to talk to the tags maintainer (not an ORES issue).
Did you check what I briefly mentioned, which is using a specific recentchanges-role slave?
@jcrespo I wrote there, I hope I understood you correctly, We can't get that threshold from DB directly, we need to get user preferences and then determine the threshold based on that value.
No, I am talking about using a particular role when connecting to the database and selecting a DB_SLAVE. I get a 5x better performance when using a recentchanges slave instead of a main traffic one. Maybe your code already does it, but I can't tell.
Oh, this hook is not changing the DB slave role, It only adds condition and stuff, then send it back to mediawiki/core to run the DB query.