Page MenuHomePhabricator

Querying the first edit of a user takes longer than usual
Closed, ResolvedPublic

Description

In the stimmberechtigung tool on Toolforge, we are using this query to determine the timestamp of the first edit of a user: SELECT rev_timestamp FROM revision_userindex WHERE rev_user=? ORDER BY rev_timestamp ASC LIMIT 1;

This query used to be really fast, but currently it takes up to several minutes to execute. What causes the longer execution time, and how can I speed up the query?

Example:

MariaDB [dewiki_p]> SELECT rev_timestamp FROM revision_userindex WHERE rev_user=409049 ORDER BY rev_timestamp ASC LIMIT 1;
+----------------+
| rev_timestamp  |
+----------------+
| 20070610153454 |
+----------------+
1 row in set (2 min 8.86 sec)

Event Timeline

Ireas created this task.May 30 2019, 12:04 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 30 2019, 12:04 PM
Marostegui added subscribers: Bstorm, Marostegui.

This might be related to the changes done on the views a couple of days ago, @Bstorm, any ideas if this could be related to your maintenance and the views/indexes removal?

This is what quarry reports as EXPLAIN (https://quarry.wmflabs.org/query/36518)

id	select_type	table	type	possible_keys	key	key_len	ref	rows	Extra
1	SIMPLE	revision	ref	user_timestamp	user_timestamp	4	const	43690	Using where; Using temporary; Using filesort
1	SIMPLE	revision_actor_temp	eq_ref	PRIMARY,revactor_rev	revactor_rev	4	dewiki.revision.rev_id	1

What causes the longer execution time

Maybe https://lists.wikimedia.org/pipermail/cloud/2019-May/000653.html is related.

This is related to T221339 - specifically https://gerrit.wikimedia.org/r/#/c/511910/, and to the changes coming Monday when the rev_user field will no longer exist on the replicas. We shifted index use to favor rev_actor instead and related fields by aliasing to a temp table (that you probably don't want to worry about). In short, the fix is to stop using rev_user and rev_user_text asap because after Monday they are being dropped for T223406

We are doing everything we can to speed up queries that use actor fields instead of the deprecated ones to unblock people who have been migrating tools to the new schema.

Try using where rev_actor = <some_actor_id> instead.

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.May 30 2019, 7:02 PM
Xqt added a subscriber: Xqt.Jun 2 2019, 10:28 AM
Ireas closed this task as Resolved.Jun 2 2019, 10:32 AM
Ireas claimed this task.

Thanks for the explanation! Changing the query to use the actor indeed seems to fix the problem.