Page MenuHomePhabricator

action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 function: /* ApiQueryPagePropNames::execute */
Closed, ResolvedPublic

Description

Querying the API on enwiki for "all page property names in use on the wiki" is really slow the first time. Then I tried asking for the maximum with ppnlimit=500, and got

{
    "servedby": "mw1235",
    "error": {
        "code": "internal_api_error_DBQueryError",
        "info": "[a93a2aeb] Database query error"
    }
}

Maybe this is just a fact of life doing a DISTINCT query on the page_props table on a huge wiki (includes/api/ApiQueryPagePropNames.php); if so perhaps the doc could note the performance issue.

Event Timeline

Spage raised the priority of this task from to Needs Triage.
Spage updated the task description. (Show Details)
Spage added a subscriber: Spage.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 19 2015, 12:06 AM

It seems there are only 30 pageprops on enwiki, I get the same results for ppnlimit=50 or ppnlimit=100. The query with ppnlimit=500 failed twice with DB errors, but now works and returns in a few seconds. Hmm.

Anomie added a subscriber: Anomie.Oct 19 2015, 2:01 PM

The relevant query is

SELECT DISTINCT pp_propname FROM page_props ORDER BY pp_propname LIMIT 501;

The plan for this query (on db1073) doesn't seem problematic, it's using what should be a reasonably efficient loose index scan:

+------+-------------+------------+-------+---------------+------------------+---------+------+------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key              | key_len | ref  | rows | Extra                    |
+------+-------------+------------+-------+---------------+------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | page_props | range | NULL          | pp_propname_page | 62      | NULL |   22 | Using index for group-by |
+------+-------------+------------+-------+---------------+------------------+---------+------+------+--------------------------+

But the exception reported was on db1065. Trying that one gives a worse plan:

+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
| id   | select_type | table      | type  | possible_keys | key              | key_len | ref  | rows     | Extra       |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
|    1 | SIMPLE      | page_props | index | NULL          | pp_propname_page | 66      | NULL | 24486527 | Using index |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+

Further checking shows that enwiki on db1052, db1053, db1057, db1072, and db1073 use the good plan, while db1051, db1055, db1065, and db1066 use the bad plan.

BTW, this discrepancy in queries reminds me of T105189, which is another situation that's relying on loose index scans for group-by but the database isn't always cooperating.

Further checking shows that enwiki on db1052, db1053, db1057, db1072, and db1073 use the good plan, while db1051, db1055, db1065, and db1066 use the bad plan.

Might be worth poking @jcrespo about this... I had a quick look, they all seem to be using 5.5.5-10.0.16-MariaDB-log, except 5.5.34-MariaDB-1~precise-log on db1052

I was wondering if it was some issue due to version differences, but seemingly not. Maybe config?

Anomie added a comment.Jul 3 2016, 8:23 PM

Looking at this again, I see an interesting correlation. It uses the loose index scan without a limit at all:

mysql:wikiadmin@db1065 [enwiki]> explain SELECT pp_propname  FROM `page_props`  GROUP BY pp_propname ORDER BY pp_propname;
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key                      | key_len | ref  | rows | Extra                    |
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | page_props | range | NULL          | pp_propname_sortkey_page | 62      | NULL | 6467 | Using index for group-by |
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

The switch in plan when a limit is added seems to directly correspond to that predicted number of rows:

mysql:wikiadmin@db1065 [enwiki]> explain SELECT pp_propname  FROM `page_props`  GROUP BY pp_propname ORDER BY pp_propname LIMIT 6467;
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key                      | key_len | ref  | rows | Extra                    |
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | page_props | range | NULL          | pp_propname_sortkey_page | 62      | NULL | 6467 | Using index for group-by |
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql:wikiadmin@db1065 [enwiki]> explain SELECT pp_propname  FROM `page_props`  GROUP BY pp_propname ORDER BY pp_propname LIMIT 6466;
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
| id   | select_type | table      | type  | possible_keys | key              | key_len | ref  | rows     | Extra       |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
|    1 | SIMPLE      | page_props | index | NULL          | pp_propname_page | 66      | NULL | 24451631 | Using index |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
1 row in set (0.00 sec)

I note here it's switching indexes too, but that doesn't seem to really be relevant:

mysql:wikiadmin@db1065 [enwiki]> explain SELECT pp_propname  FROM `page_props` use index (pp_propname_page) GROUP BY pp_propname ORDER BY pp_propname limit 7719;
+------+-------------+------------+-------+---------------+------------------+---------+------+------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key              | key_len | ref  | rows | Extra                    |
+------+-------------+------------+-------+---------------+------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | page_props | range | NULL          | pp_propname_page | 62      | NULL | 7719 | Using index for group-by |
+------+-------------+------------+-------+---------------+------------------+---------+------+------+--------------------------+
1 row in set (0.00 sec)

mysql:wikiadmin@db1065 [enwiki]> explain SELECT pp_propname  FROM `page_props` use index (pp_propname_page) GROUP BY pp_propname ORDER BY pp_propname limit 7718;
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
| id   | select_type | table      | type  | possible_keys | key              | key_len | ref  | rows     | Extra       |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
|    1 | SIMPLE      | page_props | index | NULL          | pp_propname_page | 66      | NULL | 24451633 | Using index |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+

So apparently it's deciding that the much slower tight index scan over 3700 times as many rows is preferable when the limit might result in a few rows from the loose scan being thrown away?

Tgr added a subscriber: Tgr.Sep 1 2016, 9:11 AM

On enwiki it times out even with limit=50.

Dalba added a subscriber: Dalba.EditedApr 23 2017, 9:28 AM

Pywikibot has a similar problem (T163635) with the following API query on Wikidata.

It takes nearly 40 seconds to get a response.

Pywikibot has a similar problem (T163635) with the following API query on Wikidata.

It takes nearly 40 seconds to get a response.

What's the SQL query? We are having some issues with API slaves lately since we switched to codfw, not saying this is it, but it could be related to it, this is the main tracking ticket with its subtasks: T163495
Thanks!

jcrespo renamed this task from action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 to action=query&list=pagepropnames really slow on a big wiki, got error with ppnlimit=500 function: /* ApiQueryPagePropNames::execute */.Apr 24 2017, 8:18 AM

Adding the function name to avoid filing duplicate tickets ApiQueryPagePropNames::execute (correct me if I am mistaken).

From what I see in the query plan it looks like a full table scan on (for example) 59,183,780 rows for wikidata /o\
Types of page_props should be stored in an intermediary cache (memcached, cdb, whatever) or the query should be changed in a way that it only reads several rows instead of millions. I'm on it.

It seems it's a known bug in mysql < 5.6 (WMF is 5.5): https://bugs.mysql.com/bug.php?id=61517

Here are some query plans:

mysql:wikiadmin@db2045 [wikidatawiki]> EXPLAIN SELECT /* ApiQueryPagePropNames::execute */ DISTINCT pp_propname FROM `page_props` ORDER BY pp_propname limit 1;
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
| id   | select_type | table      | type  | possible_keys | key              | key_len | ref  | rows     | Extra       |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
|    1 | SIMPLE      | page_props | index | NULL          | pp_propname_page | 66      | NULL | 63586088 | Using index |
+------+-------------+------------+-------+---------------+------------------+---------+------+----------+-------------+
1 row in set (0.02 sec)

mysql:wikiadmin@db2045 [wikidatawiki]> EXPLAIN SELECT /* ApiQueryPagePropNames::execute */ DISTINCT pp_propname FROM `page_props` ORDER BY pp_propname;
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
| id   | select_type | table      | type  | possible_keys | key                      | key_len | ref  | rows | Extra                    |
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
|    1 | SIMPLE      | page_props | range | NULL          | pp_propname_sortkey_page | 62      | NULL | 3827 | Using index for group-by |
+------+-------------+------------+-------+---------------+--------------------------+---------+------+------+--------------------------+
1 row in set (0.02 sec)

So I suggest mediawiki core drop the limit when the database type is mysql until WMF database upgrades to 5.6:

Given that the total number of property types won't be more than 50 (worst case is wikidata with 23 results) this won't become problematic for any database in wikimedia.

ladsgroup@naos:~$ mysql --help
mysql  Ver 14.14 Distrib 5.5.54, for debian-linux-gnu (x86_64) using readline 6.3

I mistook Distrib and Ver. Now, I'm confused which version is affected :/

@Ladsgroup - there is no mysql installed on naos, what you are seeing is a client version, which BTW, shouldn't be there.

ladsgroup@naos:~$ mysql --help
mysql  Ver 14.14 Distrib 5.5.54, for debian-linux-gnu (x86_64) using readline 6.3

I mistook Distrib and Ver. Now, I'm confused which version is affected :/

mysql:wikiadmin@db1080 [enwiki]> select @@version\G
*************************** 1. row ***************************
@@version: 10.0.30-MariaDB

See also https://en.wikipedia.org/wiki/MariaDB.

Change 349952 merged by jenkins-bot:
[mediawiki/core@master] Do not add limit to ApiQueryPagePropNames when database type is mysql

https://gerrit.wikimedia.org/r/349952

Change 350585 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/core@wmf/1.29.0-wmf.21] Do not add limit to ApiQueryPagePropNames when database type is mysql

https://gerrit.wikimedia.org/r/350585

Change 350585 merged by jenkins-bot:
[mediawiki/core@wmf/1.29.0-wmf.21] Do not add limit to ApiQueryPagePropNames when database type is mysql

https://gerrit.wikimedia.org/r/350585

Pywikibot has a similar problem (T163635) with the following API query on Wikidata.

It takes nearly 40 seconds to get a response.

This now responds in less than a second. Since it's backported to wmf.21 only, Wikipedia will get to it tomorrow but I think it's safe to close this

Ladsgroup closed this task as Resolved.Apr 27 2017, 6:24 PM

I'm not so sure about closing it. Is that patch really the solution, or just a workaround?

Yeah, I don't know. My reason is that this is an upstream bug and should not be handled in the mediawiki itself (or we can not find proper solution in our environment) if you disagree, feel free to reopen :)