Page MenuHomePhabricator

hidebyothers extremely slow for logged in users
Closed, ResolvedPublic

Description

hidebyothers is slow for logged in users due to the limited number of rows that match and the lack of an index on rc_user.

Event Timeline

Restricted Application added a project: Collaboration-Team-Triage. · View Herald TranscriptMar 27 2017, 8:31 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Mattflaschen-WMF renamed this task from hidebyothers extremely slow to hidebyothers extremely slow for logged in users.Mar 27 2017, 8:32 PM
Mattflaschen-WMF updated the task description. (Show Details)
Mattflaschen-WMF added a subscriber: jcrespo.

Change 345014 had a related patch set uploaded (by Mattflaschen):
[mediawiki/core@master] hidemyself/hidebyothers: Use rc_user_text since there is an index

https://gerrit.wikimedia.org/r/345014

Mattflaschen-WMF added a subscriber: Catrope.

@jcrespo What do you think about adding an index to recentchanges on rc_user? That would allow us to use it for hidebyothers, while still having that be performant.

I'm not sure if there's a specific reason why it previously (before above patch) used rc_user. However, I also prefer to use the ID where possible (though I think rc_user_text is OK in theory, e.g. Renameuser handles it).

What do you think about adding an index to recentchanges on rc_user?

There are currently 11 indexes on recentchanges:

PRIMARY KEY (`rc_id`),
KEY `rc_timestamp` (`rc_timestamp`),
KEY `rc_namespace_title` (`rc_namespace`,`rc_title`),
KEY `rc_cur_id` (`rc_cur_id`),
KEY `new_name_timestamp` (`rc_new`,`rc_namespace`,`rc_timestamp`),
KEY `rc_ip` (`rc_ip`),
KEY `rc_ns_usertext` (`rc_namespace`,`rc_user_text`),
KEY `rc_user_text` (`rc_user_text`,`rc_timestamp`),
KEY `tmp_2` (`rc_bot`,`rc_timestamp`),
KEY `tmp_3` (`rc_namespace`,`rc_timestamp`),
KEY `rc_name_type_patrolled_timestamp` (`rc_namespace`,`rc_type`,`rc_patrolled`,`rc_timestamp`)

If the current issue has been solved by using rc_user_text, I would not invest on extra indexes on rc_user, because the model is already thought for change soon (actor_id instead of rc_user + rc_user_text): https://www.mediawiki.org/wiki/User:Brion_VIBBER/Compacting_the_revision_table_round_2 . I get the idea, and kind of share the sentiment, but the large refactoring will solve that- have a unique identifier for all editors so if a not-so perfect patch works, I would do that for now. I would like you onboard with the proposal there, but of course that is offtopic.

Also note that this only works because the index is on (rc_user_text, rc_timestamp), because the query is something like WHERE rc_user_text='X" ORDER BY rc_timestamp DESC. I agree with @jcrespo that it'd be best to wait for the actor schema change and use rc_user_text in the meantime.

Change 345014 merged by jenkins-bot:
[mediawiki/core@master] hidemyself/hidebyothers: Use rc_user_text since there is an index

https://gerrit.wikimedia.org/r/345014

QA Recommendation: Resolve

Checked the fix in betalabs - with the max (500 changes in 30 days) the difference comparing with production (ptwiki) is quite noticeable - around 1-2 s in betalabs vs. 4 min in ptwiki.

@Mattflaschen-WMF strangely I do not see the issue T161562: Conflicting filters with 'Logged actions' load results extremely slow in betalabs; it's still present in ptwiki.

@Mattflaschen-WMF strangely I do not see the issue T161562: Conflicting filters with 'Logged actions' load results extremely slow in betalabs; it's still present in ptwiki.

It depends on the size of the recent changes table.

Also note that this only works because the index is on (rc_user_text, rc_timestamp), because the query is something like WHERE rc_user_text='X" ORDER BY rc_timestamp DESC. I agree with @jcrespo that it'd be best to wait for the actor schema change and use rc_user_text in the meantime.

My understanding is it can use that index for any query on rc_user_text, since it's the left-most part: "MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on." (https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html)

My understanding is it can use that index for any query on rc_user_text, since it's the left-most part: "MySQL can use multiple-column indexes for queries that test all the columns in the index, or queries that test just the first column, the first two columns, the first three columns, and so on." (https://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html)

Yes, that's correct. However, all queries from Special:Recentchanges use ORDER BY rc_timestamp, so the index being used needs to also be appropriate to satisfy that (unless there's more multiple index / index-merge magic that's been added to MySQL that I don't know about).

jmatazzoni closed this task as Resolved.Mar 29 2017, 10:27 PM