[Bug] Wikidata action=query&list=tags should not take 15-25 seconds to respond
Closed, ResolvedPublic

Description

The RTRC[1] tool queries the tags during startup to populate a filter dropdown in the UI.

On most wikis this has no notable impact on load time as it runs in parallel with other requests (such as the list=recentchanges query). But on Wikidata this thing takes well over 10 seconds to load.

I suspect this is because of all the "OAuth CID: ##" entries.

Example:
https://www.wikidata.org/w/api.php?format=json&action=query&list=tags&tgprop=displayname&continue=

[1]
https://github.com/Krinkle/mw-gadget-rtrc/issues/46
https://www.wikidata.org/wiki/Special:BlankPage/RTRC
https://www.mediawiki.org/wiki/Special:BlankPage/RTRC?withModule=ext.gadget.rtrc&kickstart=1

Related Objects

StatusAssignedTask
ResolvedNone
ResolvedLadsgroup
OpenLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
OpenNone
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedMarostegui
ResolvedBstorm
ResolvedLadsgroup
ResolvedLadsgroup
OpenMarostegui
ResolvedLadsgroup
OpenNone
OpenLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
ResolvedLadsgroup
OpenNone
ResolvedLadsgroup
ResolvedLadsgroup
Krinkle created this task.Jul 8 2015, 5:05 PM
Krinkle updated the task description. (Show Details)
Krinkle raised the priority of this task from to Needs Triage.
Krinkle added a subscriber: Krinkle.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 8 2015, 5:05 PM
Krinkle updated the task description. (Show Details)Jul 8 2015, 5:06 PM
Krinkle set Security to None.
Anomie added a subscriber: Anomie.Jul 8 2015, 5:59 PM

The slowness in the module is due to the database query to fetch the list of used tags:

SELECT ct_tag, 0 AS hitcount FROM change_tag GROUP BY ct_tag ORDER BY ct_tag LIMIT 11;

Strangely, increasing the limit seems to make it much faster. The query plans for the low-limit case versus the high-limit case look like:

> explain SELECT ct_tag, 0 AS hitcount FROM change_tag GROUP BY ct_tag ORDER BY ct_tag LIMIT 101;
+------+-------------+------------+-------+---------------+-------------------+---------+------+------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key               | key_len | ref  | rows | Extra                    |
+------+-------------+------------+-------+---------------+-------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | change_tag | range | NULL          | change_tag_tag_id | 257     | NULL |   35 | Using index for group-by |
+------+-------------+------------+-------+---------------+-------------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

> explain SELECT ct_tag, 0 AS hitcount FROM change_tag GROUP BY ct_tag ORDER BY ct_tag LIMIT 11;
+------+-------------+------------+-------+---------------+-------------------+---------+------+----------+-------------+
| id   | select_type | table      | type  | possible_keys | key               | key_len | ref  | rows     | Extra       |
+------+-------------+------------+-------+---------------+-------------------+---------+------+----------+-------------+
|    1 | SIMPLE      | change_tag | index | NULL          | change_tag_tag_id | 272     | NULL | 31483031 | Using index |
+------+-------------+------------+-------+---------------+-------------------+---------+------+----------+-------------+
1 row in set (0.00 sec)

There's also another mode ( https://www.wikidata.org/w/api.php?format=json&action=query&list=tags&tgprop=displayname|hitcount&continue=) that does COUNT(*) AS hitcount instead, BTW. That one uses the slower plan in all cases.

BTW, I note that the module only returns a 7 OAuth entries, although one of them (cid 93) accounts for 83% of the 31176994 rows in the change_tags table and the seven together account for almost 94%.

Sjoerddebruin added a comment.EditedJul 27 2015, 7:59 PM

Can this be given some priority? I would love to work faster again.

Krinkle updated the task description. (Show Details)Sep 12 2015, 5:59 PM
Krinkle removed a subscriber: Krinkle.Mar 22 2016, 10:44 PM
Lydia_Pintscher renamed this task from Wikidata action=query&list=tags should not take 15-25 seconds to respond to [Bug] Wikidata action=query&list=tags should not take 15-25 seconds to respond.Apr 21 2016, 1:52 PM
Lydia_Pintscher triaged this task as Normal priority.

No other wiki has 25M+ hitcount tags, and querying for defined tags requires a query for used tags in OAuth (since it only defines tags with hits). The query for tag stats is systematically slow and has effectively no cache since each tag addition purges it.
https://gerrit.wikimedia.org/r/#/c/218265 aims to partially solve this issue by providing a cache invalidated only when a new tag is created or a tag is deleted. If one doesn't need precise hitcounts, as is the case for a drop down menu, or for OAuth, this is enough.

For me, it now responded in 3.93 seconds. I think this is resolved.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 24 2017, 10:27 AM

Just over 4 seconds for me

Ladsgroup closed this task as Resolved.Apr 25 2017, 5:34 AM

I resolved it, feel free to repoen.