Page MenuHomePhabricator

Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?)
Closed, DeclinedPublic

Description

On en.wiki, using a 7-day search

  • add the following to the default filters: "Very likely good faith" + "Likely have problems."
  • Result: one time, after about 2.5 minutes, found some two-dozen results. Other times, the search never did finish.

Despite being sometimes rare, this is a perfectly valid search. In fact, looking for problems made by good faith editors is exactly what we hoped users of the New Filters beta would do.

The issue here would seem to be that this is a fairly rare event on en.wiki: I found only 23 in 7 days. And, of course, other wikis are not as big as en.wiki. On pl, cs and he, this search works fine and gets lots of results. On ru.wiki, it's slow but searching back 7 days takes about a minute. en.wiki is by far the worst, with the search taking many minutes, often finding nothing, and, one time I tried it, failing altogether ([WZ4J2wpAAEMAAEW2y20AAACK] 2017-08-23 23:04:55: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" )
'
(There are other searches that take a long time. E.g., "very likely good" + :Very likely bad faith." But that one is arguably invalid and could just be short circuited. See T164292 )

What can we do to make such searches —and this one in particular—faster?

(See related task T164621 )

Related Objects

Event Timeline

@Mattflaschen-WMF @SBisson @Catrope @Halfak @Jdforrester-WMF this is currently listed as a blocker for RC Filters beta graduation, in part because, as the task description says, "Very likely good faith" + "Likely have problems" is a perfectly valid search. Indeed, enabling reviewers to find users who are making mistakes but who are nonetheless in good faith was precisely one of the goals of ERI.

As the description says, on most smaller wikis, like pl, cs and he, this search works fine and gets lots of results. On ru.wiki, it's slow but searching back 7 days takes about a minute. en.wiki is by far the worst, with the search taking many minutes, often finding nothing, and, one time I tried it, failing altogether ([WZ4J2wpAAEMAAEW2y20AAACK] 2017-08-23 23:04:55: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" )

So, your opinions and ideas please: what can we do to make this work better on en.wiki? How time-consuming is that fix? And should we hold up beta graduation for this?

[...]
As the description says, on most smaller wikis, like pl, cs and he, this search works fine and gets lots of results. On ru.wiki, it's slow but searching back 7 days takes about a minute. en.wiki is by far the worst, with the search taking many minutes, often finding nothing, and, one time I tried it, failing altogether ([WZ4J2wpAAEMAAEW2y20AAACK] 2017-08-23 23:04:55: Fatal exception of type "Wikimedia\Rdbms\DBQueryError" )

So, your opinions and ideas please: what can we do to make this work better on en.wiki? How time-consuming is that fix? And should we hold up beta graduation for this?

Indexing ores_classification.oresc_probability was mentioned in the past as a possible solution to this but I wouldn't speculate about how much that would help. @Ladsgroup, @awight thoughts?

! In T164796#3553635, @SBisson wrote:

....Indexing ores_classification.oresc_probability was mentioned in the past as a possible solution to this but I wouldn't speculate about how much that would help. @Ladsgroup, @awight thoughts?

Hey guys, can you please weigh in on this? This ticket is a blocker for our quarterly goal. @Halfak?

Rare combos are likely not to be in cache which is probably causing the delays in getting results, perhaps we need to start caching more combos by default? Just my .02¢

jmatazzoni renamed this task from Reduce very long search times on RC Page when using ORES for rare combos to Very long search times on RC Page for "Very likely good faith" + "Likely have problems".Aug 28 2017, 8:36 PM
jmatazzoni updated the task description. (Show Details)

I just renamed this ticket to make the focus on "Very likely good faith" + "Likely have problems" explicit. That is the combo I'm interested in speeding up.

I think @Zppix is right that the problem happens only when filtering for rare cases, but I don't believe caching is involved, since no API calls are being made. My guess is that it's something about how the query is constructed, that either pulls a large number of records into memory to filter in PHP, or the query is missing indexes?

jmatazzoni renamed this task from Very long search times on RC Page for "Very likely good faith" + "Likely have problems" to Very long search times on RC Page for "Very likely good faith" + "Likely have problems" (on en.wiki only?).Aug 28 2017, 9:34 PM
jmatazzoni updated the task description. (Show Details)
jmatazzoni updated the task description. (Show Details)

Here's a query pulled from my development wiki, with filters "Very likely good faith" + "Likely have problems",

SELECT /*! STRAIGHT_JOIN */ rc_id, 
                            rc_timestamp, 
                            rc_user, 
                            rc_user_text, 
                            rc_namespace, 
                            rc_title, 
                            rc_comment, 
                            rc_minor, 
                            rc_bot, 
                            rc_new, 
                            rc_cur_id, 
                            rc_this_oldid, 
                            rc_last_oldid, 
                            rc_type, 
                            rc_source, 
                            rc_patrolled, 
                            rc_ip, 
                            rc_old_len, 
                            rc_new_len, 
                            rc_deleted, 
                            rc_logid, 
                            rc_log_type, 
                            rc_log_action, 
                            rc_params, 
                            wl_user, 
                            wl_notificationtimestamp, 
                            page_latest, 
                            (SELECT Group_concat(ct_tag SEPARATOR ',') 
                             FROM   `change_tag` 
                             WHERE  ct_rc_id = rc_id)            AS `ts_tags`, 
                            ores_damaging_cls.oresc_probability  AS 
                            `ores_damaging_score`, 
                            ores_goodfaith_cls.oresc_probability AS 
                            `ores_goodfaith_score` 
FROM   `recentchanges` 
       LEFT JOIN `watchlist` 
              ON ( wl_user = '1' 
                   AND ( wl_title = rc_title ) 
                   AND ( wl_namespace = rc_namespace ) ) 
       LEFT JOIN `page` 
              ON (( rc_cur_id = page_id )) 
       LEFT JOIN `ores_model` `ores_damaging_mdl` 
              ON ( ores_damaging_mdl.oresm_is_current = '1' 
                   AND ores_damaging_mdl.oresm_name = 'damaging' ) 
       LEFT JOIN `ores_classification` `ores_damaging_cls` 
              ON ( ( ores_damaging_cls.oresc_model = 
                   ores_damaging_mdl.oresm_id ) 
                   AND ( rc_this_oldid = ores_damaging_cls.oresc_rev ) 
                   AND ores_damaging_cls.oresc_class = '1' ) 
       LEFT JOIN `ores_model` `ores_goodfaith_mdl` 
              ON ( ores_goodfaith_mdl.oresm_is_current = '1' 
                   AND ores_goodfaith_mdl.oresm_name = 'goodfaith' ) 
       LEFT JOIN `ores_classification` `ores_goodfaith_cls` 
              ON ( ( ores_goodfaith_cls.oresc_model = 
                     ores_goodfaith_mdl.oresm_id ) 
                   AND ( rc_this_oldid = ores_goodfaith_cls.oresc_rev ) 
                   AND ores_goodfaith_cls.oresc_class = '1' ) 
WHERE  (( ores_damaging_cls.oresc_probability BETWEEN 0.75 AND 1 )) 
       AND (( ores_goodfaith_cls.oresc_probability BETWEEN 0.35 AND 1 )) 
       AND ( rc_timestamp >= '20170821214842' ) 
       AND ( rc_namespace NOT IN ( '1198', '1199' ) ) 
       AND rc_new IN ( '0', '1' ) 
ORDER  BY rc_timestamp DESC 
LIMIT  50

This is the explain on enwiki,

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: recentchanges
         type: range
possible_keys: rc_timestamp,rc_namespace_title,new_name_timestamp,rc_ns_usertext,tmp_1,tmp_3,rc_name_type_patrolled_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 4807253
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: watchlist
         type: eq_ref
possible_keys: wl_user,namespace_title
          key: wl_user
      key_len: 265
          ref: const,enwiki.recentchanges.rc_namespace,enwiki.recentchanges.rc_title
         rows: 1
        Extra: 
*************************** 3. row ***************************
           id: 1
  select_type: PRIMARY
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: enwiki.recentchanges.rc_cur_id
         rows: 1
        Extra: 
*************************** 4. row ***************************
           id: 1
  select_type: PRIMARY
        table: ores_damaging_mdl
         type: ref
possible_keys: PRIMARY,oresm_version,ores_model_status
          key: ores_model_status
      key_len: 35
          ref: const,const
         rows: 1
        Extra: Using index condition
*************************** 5. row ***************************
           id: 1
  select_type: PRIMARY
        table: ores_damaging_cls
         type: eq_ref
possible_keys: oresc_rev_model_class,oresc_rev_predicted_model
          key: oresc_rev_model_class
      key_len: 7
          ref: enwiki.recentchanges.rc_this_oldid,enwiki.ores_damaging_mdl.oresm_id,const
         rows: 1
        Extra: Using index condition; Using where
*************************** 6. row ***************************
           id: 1
  select_type: PRIMARY
        table: ores_goodfaith_mdl
         type: ref
possible_keys: PRIMARY,oresm_version,ores_model_status
          key: ores_model_status
      key_len: 35
          ref: const,const
         rows: 1
        Extra: Using index condition
*************************** 7. row ***************************
           id: 1
  select_type: PRIMARY
        table: ores_goodfaith_cls
         type: eq_ref
possible_keys: oresc_rev_model_class,oresc_rev_predicted_model
          key: oresc_rev_model_class
      key_len: 7
          ref: enwiki.recentchanges.rc_this_oldid,enwiki.ores_goodfaith_mdl.oresm_id,const
         rows: 1
        Extra: Using index condition; Using where
*************************** 8. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: change_tag
         type: ref
possible_keys: ct_rc_id,change_tag_rc_tag
          key: ct_rc_id
      key_len: 5
          ref: enwiki.recentchanges.rc_id
         rows: 338895
        Extra: Using index

The query takes 29 seconds to execute.

Taking out the unused subquery to get ts_tags, query time drops down to 1.35s. I think we got lucky this time, and the problem is not in ORES tables...

@awight notes:

Taking out the unused subquery to get ts_tags, query time drops down to 1.35s.

Thanks for having a look Adam Sounds promising. Can you please translate that for the uninitiated like myself. Are you recommending a course of action that might fix this issue? What would the ticket say?

Adam also notes:

The query takes 29 seconds to execute.

How long the search takes obviously depends on many factors, including the number of results requested, the number of days searched, the other filters selected at the same time, etc. For the record, I just made three test searches with settings of 14 days/100 results. They were much faster than when I wrote this ticket, though not what I'd call fast—about 50 seconds. But on the downside, the searches retrieved no results and twice crashed the system with the following error messages:

[WaX0PApAAEYAAD1Bjx8AAAAB] 2017-08-29 23:10:48: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"
[WaX2CQpAAEIAAAQ9NBUAAABH] 2017-08-29 23:18:29: Fatal exception of type "Wikimedia\Rdbms\DBQueryError"

Here are the search parameters I used:

https://en.wikipedia.org/w/index.php?title=Special:RecentChanges&hidebots=1&hidecategorization=1&hideWikibase=1&hidelog=1&highlight=1&limit=100&days=14&urlversion=2&damaging=likelybad&goodfaith=likelygood

@jmatazzoni ah sorry, I wasn't suggesting that the query is any faster than you said. I was running on a dedicated analytics box, so the timings I gave only matter on a relative scale. The takeaway is that the query runs 30x faster when the specific unused clause I quoted is removed.

The bad news is that I didn't understand the change_tag clause when I looked at this problem yesterday. It's:

(SELECT Group_concat(ct_tag SEPARATOR ',') 
                             FROM   `change_tag` 
                             WHERE  ct_rc_id = rc_id)            AS `ts_tags`,

This isn't actually unused, it's returning tags which I assume are rendered in the results, somehow. The only quick fix that comes to mind is that we can fetch the tags in a second query, after limiting to a small number of revisions when paging the view.

In T164796#3564570, @awight wrote:

@jmatazzoni ah sorry, I wasn't suggesting that the query is any faster than you said. I was running on a dedicated analytics box, so the timings I gave only matter on a relative scale. The takeaway is that the query runs 30x faster when the specific unused clause I quoted is removed.

The bad news is that I didn't understand the change_tag clause when I looked at this problem yesterday. It's:

(SELECT Group_concat(ct_tag SEPARATOR ',') 
                             FROM   `change_tag` 
                             WHERE  ct_rc_id = rc_id)            AS `ts_tags`,

This isn't actually unused, it's returning tags which I assume are rendered in the results, somehow. The only quick fix that comes to mind is that we can fetch the tags in a second query, after limiting to a small number of revisions when paging the view.

@Mattflaschen-WMF, @SBisson, @Catrope, do you guys think that approach will work? This is a blocker for beta graduation, and the db query errors noted above would seem to justify that. Should we move Adam's plan forward, or do you have other ideas?

@Mattflaschen-WMF did you make the subtask for this about the error message? I want to add it to the blockers list.

@jmatazzoni ah sorry, I wasn't suggesting that the query is any faster than you said. I was running on a dedicated analytics box, so the timings I gave only matter on a relative scale. The takeaway is that the query runs 30x faster when the specific unused clause I quoted is removed.

The bad news is that I didn't understand the change_tag clause when I looked at this problem yesterday. It's:

(SELECT Group_concat(ct_tag SEPARATOR ',') 
                             FROM   `change_tag` 
                             WHERE  ct_rc_id = rc_id)            AS `ts_tags`,

This isn't actually unused, it's returning tags which I assume are rendered in the results, somehow. The only quick fix that comes to mind is that we can fetch the tags in a second query, after limiting to a small number of revisions when paging the view.

Is it consistently faster? If you ran the original query first, and then the modified one second, a more likely explanation is that the relevent stuff is in memory the second time around. It really doesn't make sense for that to make a significant difference, since the worst case for change_tags on enwiki is 8 rows, and the common case is 0.

Copying my comment from T171027#3677195 over here as I commented on the wrong bug, and its good to keep things together for reference:

For the enwiki case, and talking about watchlist (Not Special:RecentChanges) all the timeouts I could find - https://logstash.wikimedia.org/goto/3183d6a077082f6b4469255e51e60105 (This are just the severe read timeouts. I had trouble making a logstash query for the query longer than 5 seconds warning) had STRAIGHT_JOIN specified, including people with small watchlists. This probably makes sense for Special:Recentchanges case, but for the watchlist case, when the user has a small watchlist (unclear what "small" means here, but lets say < 50,000 entries), this will cause the query to be much slower then needed. I'd reccomend removing the STRAIGHT_JOIN in the watchlist case for ORES


Other ideas to throw out there (Less sure about these): Extending the index oresc_rev_model_class for ores_classification to have oresc_probability at the end may improve speed mildly due to using a covering index.

I wonder if we knew that a particular filter was very selective and likely to have only a few results, and we know a specific range of revisions we're looking at (Since rev_id is monotonically increasing, I guess we can assume a specific range of rev_id based on date), and we had an index on (oresc_model, oresc_class, oresc_rev, oresc_probability) if a query like SELECT <stuff> from ores_classification inner join <all watchlist/rc/etc> WHERE oresc_model = whatever AND oresc_class = 1 AND ores_rev BETWEEN 1234 AND 4321 AND ores_probability BETWEEN 0.7 AND 0.8 would be faster (On the basis of the idea it uses a range to get to just the revisions in question, and it can filter out the non-matching probabilities using the covering index right away before joining with everything all over the database). I don't know, that might not make sense.

@Bawolff Thanks for taking a look! I think you're right that my initial analysis was wrong, and I was just seeing cached/paged-in improvements. I made a pretty random guess in related bug T176456#3661739, that the multi-column indexes were forcing a suboptimal ordering of the conditions, but I'll also experiment with your suggestions.

Change 383868 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/ORES@master] Don't use STRAIGHT_JOIN for Watchlist queries

https://gerrit.wikimedia.org/r/383868

Change 383868 merged by jenkins-bot:
[mediawiki/extensions/ORES@master] Don't use STRAIGHT_JOIN for Watchlist queries

https://gerrit.wikimedia.org/r/383868

Change 383885 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/ORES@wmf/1.31.0-wmf.3] Don't use STRAIGHT_JOIN for Watchlist queries

https://gerrit.wikimedia.org/r/383885

Change 383885 merged by jenkins-bot:
[mediawiki/extensions/ORES@wmf/1.31.0-wmf.3] Don't use STRAIGHT_JOIN for Watchlist queries

https://gerrit.wikimedia.org/r/383885

Re-checked the issue on wmf.10. There are some improvements, but those specific searches - 'Likely have problems' and 'Very likely good faith' and
'Very likely have problems' and 'Very likely good faith' are still much slower than expected.

To summarize the improvements (for 250 changes in 30 days):

  • 'Likely have problems' and 'Very likely good faith' + the default filter: Human (not bot), Page edits, Page creations, Logged actions - no failures and it does not take minutes; max time ~10s, avg. 6s
  • Watchlist (~9000 pages, 250 changes in 30 days) - max 6.5 sec; avg.7.5 s
  • 'Very likely have problems' and 'Very likely good faith' + the default filter: Human (not bot), Page edits, Page creations, Logged actions - the same results as the above; occasionally timing out: "This search has timed out. You may wish to try different search parameters."
MMiller_WMF subscribed.

Declining because of the rarity of this use case and the fact that results of this search are not very useful.