Page MenuHomePhabricator

wikidatawiki_p.wb_items_per_site query is slower than expected on Wiki Replicas
Closed, ResolvedPublic

Description

Wikidata on ToolForge is too slow. This query probably should execute under a second, but is currently taking 58.01 sec. The optmizer indicates the wb_ips_item_site_page index is being used.

SELECT 1
FROM wikidatawiki_p.wb_items_per_site
WHERE ips_site_page='Harrington Place, Stellenbosch' AND ips_site_id != 'enwiki';

Event Timeline

This is what the query looks like in production, I killed the SELECT after waiting 5 seconds, so "under a second" does not seem reasonable for Toolforge.

mysql:wikiadmin@db1110 [wikidatawiki]> EXPLAIN SELECT 1 FROM wb_items_per_site WHERE ips_site_page='Harrington Place, Stellenbosch' AND ips_site_id != 'enwiki'; 
+------+-------------+-------------------+-------+-----------------------+-----------------------+---------+------+----------+--------------------------+
| id   | select_type | table             | type  | possible_keys         | key                   | key_len | ref  | rows     | Extra                    |
+------+-------------+-------------------+-------+-----------------------+-----------------------+---------+------+----------+--------------------------+
|    1 | SIMPLE      | wb_items_per_site | index | wb_ips_item_site_page | wb_ips_item_site_page | 346     | NULL | 59534791 | Using where; Using index |
+------+-------------+-------------------+-------+-----------------------+-----------------------+---------+------+----------+--------------------------+
1 row in set (0.00 sec)
zhuyifei1999 renamed this task from Wikidata is slow on ToolForge to Wikidata is slow on Wiki Replicas.Nov 28 2017, 1:47 AM
zhuyifei1999 edited projects, added Data-Services; removed Toolforge.
zhuyifei1999 subscribed.
bd808 renamed this task from Wikidata is slow on Wiki Replicas to wikidatawiki_p.wb_items_per_site query is slower than expected on Wiki Replicas.Dec 1 2017, 6:53 AM

I forgot to mention that this is a regression. That code was executing under 1 second 6+ months ago and only in late November I noticed queries failing constantly.

Query just now on ToolForge's analytics cluster (labsdb1010): 13 min 24.37 sec. It seems to be getting worse.

jcrespo subscribed.

The query is unoptimally written- it scans every single wikidata item to return none or a few results. Change it to

SELECT 1 FROM wikidatawiki_p.wb_items_per_site WHERE ips_site_page='Harrington Place, Stellenbosch' AND ips_site_id IN ('frwiki', 'eswiki', ...);

The problem with your query is the !=, mysql (or any BTREE-based database) cannnot handle that efficiently. Use OR or IN instead from a list of explicitly defined wikis.

To get advantage of the indexes the table has. This is not a bug with wikidata or labs- those are currently working as intended.

However-
Feel free to propose additional indexes if you think they can be useful to you and are reasonable proposals- we are open to that with no problem.

Dispenser changed the task status from Invalid to Resolved.Mar 4 2018, 10:06 PM

I talked with @jcrespo on IRC and with some digging (P6782) to eventually find out the index I was using wb_ips_site_page was dropped without updating the documentation (T179793#4018800).

For reference, the query to search article titles for all wikis. This is useful in determining if a red link was untranslated from transwiking or if an article already exists in another language. Here's the updated version using the wb_ips_item_site_page index:

SELECT 1
FROM wikidatawiki_p.wb_items_per_site
WHERE ips_site_page='Harrington Place, Stellenbosch'
AND ips_site_id IN (
  SELECT dbname
  FROM meta_p.wiki
  WHERE dbname != 'enwiki'
);