Page MenuHomePhabricator

SpecialRecentChangesLinked::doMainQuery bad query bringing down database server
Closed, ResolvedPublic

Description

(SELECT /* SpecialRecentChangesLinked::doMainQuery */ /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,'0.73' AS `ores_damaging_threshold`,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`,fp_stable,fp_pending_since FROM `recentchanges` LEFT JOIN `ores_model` `ores_damaging_mdl` ON (ores_damaging_mdl.oresm_is_current = '1' AND ores_damaging_mdl.oresm_name = 'damaging') LEFT JOIN `ores_classification` `ores_damaging_cls` ON ((ores_damaging_cls.oresc_model = ores_damaging_mdl.oresm_id) AND (rc_this_oldid = ores_damaging_cls.oresc_rev) AND ores_damaging_cls.oresc_class = '1') LEFT JOIN `ores_model` `ores_goodfaith_mdl` ON (ores_goodfaith_mdl.oresm_is_current = '1' AND ores_goodfaith_mdl.oresm_name = 'goodfaith') LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON ((ores_goodfaith_cls.oresc_model = ores_goodfaith_mdl.oresm_id) AND (rc_this_oldid = ores_goodfaith_cls.oresc_rev) AND ores_goodfaith_cls.oresc_class = '1') LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) INNER JOIN `pagelinks` ON ((rc_namespace = pl_namespace) AND (rc_title = pl_title)) WHERE (rc_bot = 0) AND (rc_type != '6') AND (rc_source != 'wb') AND (ores_damaging_cls.oresc_probability > '0.73') AND rc_patrolled = '0' AND (rc_timestamp >= '20170409000000') AND pl_from = '1341445' ORDER BY rc_timestamp DESC LIMIT 50 ) UNION (SELECT /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,'0.73' AS `ores_damaging_threshold`,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`,fp_stable,fp_pending_since FROM `recentchanges` LEFT JOIN `ores_model` `ores_damaging_mdl` ON (ores_damaging_mdl.oresm_is_current = '1' AND ores_damaging_mdl.oresm_name = 'damaging') LEFT JOIN `ores_classification` `ores_damaging_cls` ON ((ores_damaging_cls.oresc_model = ores_damaging_mdl.oresm_id) AND (rc_this_oldid = ores_damaging_cls.oresc_rev) AND ores_damaging_cls.oresc_class = '1') LEFT JOIN `ores_model` `ores_goodfaith_mdl` ON (ores_goodfaith_mdl.oresm_is_current = '1' AND ores_goodfaith_mdl.oresm_name = 'goodfaith') LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON ((ores_goodfaith_cls.oresc_model = ores_goodfaith_mdl.oresm_id) AND (rc_this_oldid = ores_goodfaith_cls.oresc_rev) AND ores_goodfaith_cls.oresc_class = '1') LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) INNER JOIN `templatelinks` ON ((rc_namespace = tl_namespace) AND (rc_title = tl_title)) WHERE (rc_bot = 0) AND (rc_type != '6') AND (rc_source != 'wb') AND (ores_damaging_cls.oresc_probability > '0.73') AND rc_patrolled = '0' AND (rc_timestamp >= '20170409000000') AND tl_from = '1341445' ORDER BY rc_timestamp DESC LIMIT 50 ) UNION (SELECT /*! STRAIGHT_JOIN */ rc_id,rc_timestamp,rc_user,rc_user_text,rc_namespace,rc_title,rc_comment,rc_minor,rc_bot,rc_new,rc_cur_id,rc_this_oldid,rc_last_oldid,rc_type,rc_source,rc_patrolled,rc_ip,rc_old_len,rc_new_len,rc_deleted,rc_logid,rc_log_type,rc_log_action,rc_params,'0.73' AS `ores_damaging_threshold`,(SELECT GROUP_CONCAT(ct_tag SEPARATOR ',') FROM `change_tag` WHERE ct_rc_id=rc_id ) AS `ts_tags`,ores_damaging_cls.oresc_probability AS `ores_damaging_score`,ores_goodfaith_cls.oresc_probability AS `ores_goodfaith_score`,fp_stable,fp_pending_since FROM `recentchanges` LEFT JOIN `ores_model` `ores_damaging_mdl` ON (ores_damaging_mdl.oresm_is_current = '1' AND ores_damaging_mdl.oresm_name = 'damaging') LEFT JOIN `ores_classification` `ores_damaging_cls` ON ((ores_damaging_cls.oresc_model = ores_damaging_mdl.oresm_id) AND (rc_this_oldid = ores_damaging_cls.oresc_rev) AND ores_damaging_cls.oresc_class = '1') LEFT JOIN `ores_model` `ores_goodfaith_mdl` ON (ores_goodfaith_mdl.oresm_is_current = '1' AND ores_goodfaith_mdl.oresm_name = 'goodfaith') LEFT JOIN `ores_classification` `ores_goodfaith_cls` ON ((ores_goodfaith_cls.oresc_model = ores_goodfaith_mdl.oresm_id) AND (rc_this_oldid = ores_goodfaith_cls.oresc_rev) AND ores_goodfaith_cls.oresc_class = '1') LEFT JOIN `flaggedpages` ON ((fp_page_id = rc_cur_id)) INNER JOIN `imagelinks` ON ((rc_title = il_to)) WHERE (rc_bot = 0) AND (rc_type != '6') AND (rc_source != 'wb') AND (ores_damaging_cls.oresc_probability > '0.73') AND rc_patrolled = '0' AND (rc_timestamp >= '20170409000000') AND il_from = '1341445' AND rc_namespace = '6' ORDER BY rc_timestamp DESC LIMIT 50 ) ORDER BY rc_timestamp DESC LIMIT 50

Details

Related Gerrit Patches:
mediawiki/extensions/ORES : wmf/1.29.0-wmf.20Disable ORES in Recentchangeslinked
mediawiki/extensions/ORES : masterDisable ORES in Recentchangeslinked
operations/software : masterDatabase slave watchdog- kill all queries, not only the selects

Event Timeline

jcrespo created this task.Apr 16 2017, 8:07 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 16 2017, 8:07 AM

Change 348349 had a related patch set uploaded (by Jcrespo):
[operations/software@master] Database slave watchdog- kill all queries, not only the selects

https://gerrit.wikimedia.org/r/348349

Change 348349 merged by Jcrespo:
[operations/software@master] Database slave watchdog- kill all queries, not only the selects

https://gerrit.wikimedia.org/r/348349

jcrespo triaged this task as Unbreak Now! priority.Apr 16 2017, 8:21 AM
Restricted Application added a project: Scoring-platform-team. · View Herald TranscriptApr 16 2017, 8:21 AM
Restricted Application added subscribers: Jay8g, TerraCodes. · View Herald Transcript

@Ladsgroup is already taking a look at this as per our chat on IRC, thanks!

SpecialRecentChangesLinked is super slow and it's extending ChangesList meaning lots of things will be hooked on it without anyone noticing. I will disable ORES in that special page to make it more performant but we need to seriously reconsider this special page structure and queries (and add proper indexes if needed and possible)

Gerrit-bot doesn't add comment when I make a patch (T161525)

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptApr 16 2017, 10:03 AM

The reason that we recently found this issue is that no one use this special page.

Ladsgroup moved this task from Incoming to Blocked on others on the User-Ladsgroup board.

Gerrit-bot doesn't add comment when I make a patch (T161525)

Thanks for getting the patch ready so fast.
Pardon my ignorance, but I thought I would ask: Is there any side effect on disabling this apart from getting it off? ie: by disabling this, some other feature/query gets more load?

Is there any side effect on disabling this apart from getting it off? ie: by disabling this, some other feature/query gets more load?

No, the special page is barely used and even disabling the whole page doesn't affect much.

Thanks for clearing that up. I have +1'ed it. I guess this will be deployed after the switch DC right? As there are no deployments scheduled for this week.

Thanks for clearing that up. I have +1'ed it. I guess this will be deployed after the switch DC right? As there are no deployments scheduled for this week.

Yes, unless you can convince Greg that this is an emergency.

Change 348378 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Disable ORES in Recentchangeslinked

https://gerrit.wikimedia.org/r/348378

Thanks for clearing that up. I have +1'ed it. I guess this will be deployed after the switch DC right? As there are no deployments scheduled for this week.

Yes, unless you can convince Greg that this is an emergency.

I'm fine waiting till next week :-). This hasn't happened again since the query killer was changed. If this bite us again we might need to go ahead then

Thanks!

The reason that we recently found this issue is that no one use this special page.

Reported by me on May 11 2016, one year ago, ignored since: T134976

At that time, ORES wasn't deployed anywhere in production.

jcrespo added a comment.EditedApr 18 2017, 1:43 PM

Then the problem is not ORES. Although could it had made it worse?

Then the problem is not ORES. Although could it had made it worse?

Exactly, It joins with a huge table on columns that can't be indexed (how we can index ores_classification based on pagelinks columns?)

Ladsgroup closed this task as Resolved.Apr 20 2017, 12:31 PM
Ladsgroup moved this task from Review to Done on the Scoring-platform-team (Current) board.
Ladsgroup moved this task from Blocked on others to Done on the User-Ladsgroup board.
jcrespo reopened this task as Open.Apr 20 2017, 1:02 PM

This is not resolved, and it is still an unbreak now.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptApr 20 2017, 1:03 PM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptApr 20 2017, 1:04 PM

@jcrespo because it's not deployed yet. Do you want me to do an unscheduled deploy?

jcrespo added a subscriber: greg.Apr 20 2017, 3:51 PM

Ladsgroup, I belive this to be a very serious Wikimedia-Incident causing user visible outage, paging Ops people, so yes, I think we should deploy it ASAP, as otherwise it will happen again. @greg has the last word, I would assume, on this topic.

@jcrespo because it's not deployed yet. Do you want me to do an unscheduled deploy?

yes, if Ops are comfortable with the deploy right now (I think the dust has settled with the dc-switch over). do it asap.

And yeah, I agree with @jcrespo that this is a Wikimedia-Incident and should get a writeup and follow-ups filed.

Yes, let's deploy, and follow long term with lower priority on T134976.

Ok. I do it in half an hour.

Gerribot doesn't make a comment when I make a patch:

Change 349271 merged by jenkins-bot:
[mediawiki/extensions/ORES@wmf/1.29.0-wmf.20] Disable ORES in Recentchangeslinked

https://gerrit.wikimedia.org/r/349271

Mentioned in SAL (#wikimedia-operations) [2017-04-20T19:24:43Z] <Amir1> start of ladsgroup@naos:/srv/mediawiki-staging/php-1.29.0-wmf.20$ scap sync-file php-1.29.0-wmf.20/extensions/ORES/includes/Hooks.php '[[gerrit:349271|Disable ORES in Recentchangeslinked]] (T163063)'

The deployment is finished. I'm around if anything weird happened.

Ladsgroup closed this task as Resolved.Apr 20 2017, 7:37 PM

Just revisiting this issue: Do people think that the New Filters ORES implementation would still cause problems on Related Changes? Or is there any reason to believe that this issue has been cleared up by the many changes we've made since April?