Page MenuHomePhabricator

Certain prop=revisions API queries timeout with "internal_api_error_DBQueryError"
Closed, ResolvedPublicPRODUCTION ERROR

Description

So far I got 50+ error messages identical to this while executing make on labs:

(most recent call last):
  File "/home/he7d3r/projects/editquality/editquality/utilities/label_reverted.py", line 94, in run
    rvprop=["user"])
  File "/home/he7d3r/env/3.4/lib/python3.4/site-packages/mwreverts/api.py", line 160, in check
    rvprop={'ids', 'timestamp', 'sha1'} | rvprop
  File "/home/he7d3r/env/3.4/lib/python3.4/site-packages/mwreverts/api.py", line 20, in n_edits_after
    rvlimit=n, rvprop=rvprop)
  File "/home/he7d3r/env/3.4/lib/python3.4/site-packages/mwapi/session.py", line 157, in get
    return self._request('GET', self._normalize_params(params))
  File "/home/he7d3r/env/3.4/lib/python3.4/site-packages/mwapi/session.py", line 101, in _request
    raise APIError.from_doc(doc['error'])
mwapi.errors.APIError: internal_api_error_DBQueryError: [c1e8812e] Database query error -- None

The only difference is the hash inside the brakets:

mwapi.errors.APIError: internal_api_error_DBQueryError: [7984dda5] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [d3c7baae] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [c1e8812e] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [d443a7d3] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [1a3bfb15] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [e1f85c0e] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [f8757f85] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [daf65612] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [20cd7dec] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [b035a08f] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [41384d41] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [0ea9e046] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [8d15b7d3] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [56b474d9] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [93022fcf] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [b81ddf5a] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [4a01b08b] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [ef7c1751] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [f24f77f4] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [44a53ab2] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [a6a7240a] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [44448a30] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [335552f0] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [e7864b43] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [5dd99079] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [c3252604] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [85098dbd] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [bc414820] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [f4c540aa] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [16a0708b] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [cdd08f7a] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [cd6dca62] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [c2ed66c1] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [6905f7b9] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [d761d75b] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [25452071] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [080a8ace] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [acd11318] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [b6a4c032] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [36e8dc1e] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [903ecf2c] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [1ba6a187] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [47156dd9] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [93265c8e] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [c045cdb0] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [2e66a446] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [6dab3a7b] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [4d3d4f54] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [2b655867] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [796c37a6] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [0d57f166] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [b2dc0173] Database query error -- None
mwapi.errors.APIError: internal_api_error_DBQueryError: [0554b1b4] Database query error -- None

Event Timeline

He7d3r raised the priority of this task from to Needs Triage.
He7d3r updated the task description. (Show Details)
He7d3r subscribed.
Legoktm subscribed.

Here's the first traceback:

2015-12-12 21:04:15 mw1140 enwiki exception ERROR: [7984dda5] /w/api.php?pageids=5137507&action=query&rvstartid=654799748&rvprop=timestamp%7Csha1%7Cuser%7Cids&rvlimit=3&prop=revisions&format=json&rvend=2015-04-05T16%3A58%3A16Z&rvdir=newer   DBQueryError from line 1090 of /srv/mediawiki/php-1.27.0-wmf.8/includes/db/Database.php: A connection error occured. 
Query: SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name  FROM `revision` INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))  WHERE (rev_timestamp<='20150405165816') AND (rev_id>='654799748') AND rev_page = '5137507'  ORDER BY rev_timestamp,rev_id LIMIT 4  
Function: ApiQueryRevisions::run
Error: 2013 Lost connection to MySQL server during query (10.64.48.20)
 {"exception_id":"7984dda5"} 
[Exception DBQueryError] (/srv/mediawiki/php-1.27.0-wmf.8/includes/db/Database.php:1090) A connection error occured. 
Query: SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name  FROM `revision` INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))  WHERE (rev_timestamp<='20150405165816') AND (rev_id>='654799748') AND rev_page = '5137507'  ORDER BY rev_timestamp,rev_id LIMIT 4  
Function: ApiQueryRevisions::run
Error: 2013 Lost connection to MySQL server during query (10.64.48.20)

  #0 /srv/mediawiki/php-1.27.0-wmf.8/includes/db/Database.php(1030): DatabaseBase->reportQueryError(string, integer, string, string, boolean)
  #1 /srv/mediawiki/php-1.27.0-wmf.8/includes/db/Database.php(1572): DatabaseBase->query(string, string)
  #2 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiQueryBase.php(376): DatabaseBase->select(array, array, array, string, array, array)
  #3 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiQueryRevisions.php(317): ApiQueryBase->select(string)
  #4 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiQueryRevisionsBase.php(43): ApiQueryRevisions->run()
  #5 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiQuery.php(287): ApiQueryRevisionsBase->execute()
  #6 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiMain.php(1306): ApiQuery->execute()
  #7 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiMain.php(460): ApiMain->executeAction()
  #8 /srv/mediawiki/php-1.27.0-wmf.8/includes/api/ApiMain.php(432): ApiMain->executeActionWithErrorHandling()
  #9 /srv/mediawiki/php-1.27.0-wmf.8/api.php(83): ApiMain->execute()
  #10 /srv/mediawiki/w/api.php(3): include(string)
  #11 {main}
Anomie subscribed.

Error: 2013 Lost connection to MySQL server during query (10.64.48.20)

This generally means the query was being slow and got killed.

My best guess at what's happening here is that the database is deciding to do a range on the page_timestamp index (from the start-of-time until the rvend date) and then apply the rvstartid limitation on rev_id as a filter, and since Wikipedia:Administrators'_noticeboard/Incidents likely has a crazy-huge number of revisions before that rvstartid it's timing out.

I can't think of anything to do at the MediaWiki level about this, besides disallowing combination of rvstart/rvend/rvuser with rvstartid/rvendid (which would break this particular query more explicitly) and adjusting the sorting to target the rev_page_id index when rvstartid/rvendid are used. Doing that would take some analysis to see how often these combinations are used in practice to know how many other clients would be broken.

At the client level, if you rework your code to use rvstart instead of rvstartid you might have better luck.

He7d3r added a subscriber: Halfak.

OK. So I thought we were getting a minor speedup here by setting an rvend and only returning edits that occur within our time-window, but we always set a low limit anyway -- so this will save us sending a few extra revisions we don't need over the wire.

This function is the culprit: https://github.com/mediawiki-utilities/python-mwreverts/blob/master/mwreverts/api.py#L17

So, I did some experimenting and it looks like we still get the errors even when I don't filter by both rvstart and rvstartid.

Here's what I get from the query optimizer when I use both:

> explain SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name  FROM `revision` INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))  WHERE (rev_timestamp<='20150405165816') AND (rev_id>='654799748') AND rev_page = '5137507'  ORDER BY rev_timestamp,rev_id LIMIT 4
    -> ;
+------+-------------+----------+--------+----------------------------------------------------------------+---------------+---------+--------------------------+-----------+-------------+
| id   | select_type | table    | type   | possible_keys                                                  | key           | key_len | ref                      | rows      | Extra       |
+------+-------------+----------+--------+----------------------------------------------------------------+---------------+---------+--------------------------+-----------+-------------+
|    1 | SIMPLE      | page     | const  | PRIMARY                                                        | PRIMARY       | 4       | const                    |         1 |             |
|    1 | SIMPLE      | revision | range  | PRIMARY,rev_id,rev_timestamp,page_timestamp,rev_timestamp_user | rev_timestamp | 20      | NULL                     | 571304301 | Using where |
|    1 | SIMPLE      | user     | eq_ref | PRIMARY                                                        | PRIMARY       | 4       | enwiki.revision.rev_user |         1 | Using where |
+------+-------------+----------+--------+----------------------------------------------------------------+---------------+---------+--------------------------+-----------+-------------+
3 rows in set (0.47 sec)

Here's what I get when I just use rev_id:

explain SELECT  rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len,user_name  FROM `revision` INNER JOIN `page` ON ((page_id = rev_page)) LEFT JOIN `user` ON ((rev_user != 0) AND (user_id = rev_user))  WHERE (rev_id>=65479974) AND rev_page = '5137507'  ORDER BY rev_timestamp,rev_id LIMIT 4;
+------+-------------+----------+--------+-------------------------------+---------------+---------+--------------------------+------+-------------+
| id   | select_type | table    | type   | possible_keys                 | key           | key_len | ref                      | rows | Extra       |
+------+-------------+----------+--------+-------------------------------+---------------+---------+--------------------------+------+-------------+
|    1 | SIMPLE      | page     | const  | PRIMARY                       | PRIMARY       | 4       | const                    |    1 |             |
|    1 | SIMPLE      | revision | index  | PRIMARY,rev_id,page_timestamp | rev_timestamp | 16      | NULL                     | 2564 | Using where |
|    1 | SIMPLE      | user     | eq_ref | PRIMARY                       | PRIMARY       | 4       | enwiki.revision.rev_user |    1 | Using where |
+------+-------------+----------+--------+-------------------------------+---------------+---------+--------------------------+------+-------------+
3 rows in set (0.05 sec)

I think that the ordering is the real issue here. Regretfully, selecting by timestamp is very painful since more than one revision can have the same timestamp. We'd need to implement a lot of code to handle the various situations that could arise. I'll look into it.

Halfak reopened this task as Open.

I just moved this out of the #revision-scoring-as-a-service backlog because it's not really an issue there. It's really an issue with MediaWiki API performance or *maybe* mwreverts. If you can sort use rvstartid and rvendid along with rvdir, it should be somewhat efficient.

Krinkle renamed this task from mwapi.errors.APIError: internal_api_error_DBQueryError to Certain prop=revisions API queries timeout with "internal_api_error_DBQueryError".Sep 3 2018, 9:47 PM

Change 458133 had a related patch set uploaded (by Jcrespo; owner: Jcrespo):
[operations/mediawiki-config@master] mariadb: Repool db1114 but with low api load due to ongoing errors

https://gerrit.wikimedia.org/r/458133

Change 458133 merged by jenkins-bot:
[operations/mediawiki-config@master] mariadb: Repool db1114 but with low api load due to ongoing errors

https://gerrit.wikimedia.org/r/458133

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM

We think this isn't something we can be helpful with, so I'm untagging CPT.

Just for clarity, who would be helpful with database performance issues manifest through the MW action API?

Anomie claimed this task.

For core modules, probably CPT despite the comment in T121333#5484861. Really, it's usually a collaboration between me and the DBAs, with the DBAs helping to identify the issues and evaluate query plans and me writing the MW API code change.

The query plans in T121333#1881242 show it using the rev_timestamp index, which was later fixed as T224017. The other potential issue, where it may have been having to fetch many rows because it couldn't effectively use an index to apply the rvstartid limitation, was probably fixed by rMWe72256a6d970: API: Convert rvstartid/rvendid to timestamps for query.

Unfortunately the actual queries at issue here were never copied into this task, but based on the SQL queries in T121333#1881242 and the link to what was presumably this rev of python-mwreverts api.py in T121333#1880803 it was most likely https://en.wikipedia.org/w/api.php?action=query&prop=revisions&pageids=5137507&rvstartid=654799748&rvend=20150405165816&rvdir=newer&rvlimit=3. That query is now very fast, so I'm going to close this task as Resolved. If someone can come up with a version of that query still exhibiting the bug, feel free to reopen.