Page MenuHomePhabricator

Database query error (internal_api_error_DBQueryError) while getting list=allrevisions
Closed, ResolvedPublic

Description

https://en.wikipedia.org/w/api.php?action=query&list=allrevisions&arvprop=ids|flags|timestamp&arvlimit=1

The request hangs for a while, then returns a message:

{
    "servedby": "mw1132",
    "error": {
        "code": "internal_api_error_DBQueryError",
        "info": "[01260d98] Database query error"
    }
}

Event Timeline

UniCollab raised the priority of this task from to Needs Triage.
UniCollab updated the task description. (Show Details)
UniCollab added a subscriber: UniCollab.
Anomie added a subscriber: Anomie.
2016-01-13 19:58:47 mw1132 ruwiki exception ERROR: [01260d98] /w/api.php?action=query&list=allrevisions&format=jsonfm&&arvprop=ids%7Cuser%7Cuserid&arvlimit=1   DBQueryError from line 1090 of /srv/mediawiki/php-1.27.0-wmf.9/includes/db/Database.php: A connection error occured.
Query: SELECT  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len  FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))   ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 2
Function: ApiQueryAllRevisions::run
Error: 2013 Lost connection to MySQL server during query (10.64.48.14)
 {"exception_id":"01260d98"}
[Exception DBQueryError] (/srv/mediawiki/php-1.27.0-wmf.9/includes/db/Database.php:1090) A connection error occured.
Query: SELECT  /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len  FROM `revision` INNER JOIN `page` ON ((rev_page = page_id))   ORDER BY rev_timestamp  DESC,rev_id  DESC LIMIT 2
Function: ApiQueryAllRevisions::run
Error: 2013 Lost connection to MySQL server during query (10.64.48.14)

  #0 /srv/mediawiki/php-1.27.0-wmf.9/includes/db/Database.php(1030): DatabaseBase->reportQueryError(string, integer, string, string, boolean)
  #1 /srv/mediawiki/php-1.27.0-wmf.9/includes/db/Database.php(1572): DatabaseBase->query(string, string)
  #2 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiQueryBase.php(376): DatabaseBase->select(array, array, array, string, array, array)
  #3 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiQueryAllRevisions.php(169): ApiQueryBase->select(string)
  #4 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiQueryRevisionsBase.php(43): ApiQueryAllRevisions->run()
  #5 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiQuery.php(287): ApiQueryRevisionsBase->execute()
  #6 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiMain.php(1306): ApiQuery->execute()
  #7 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiMain.php(460): ApiMain->executeAction()
  #8 /srv/mediawiki/php-1.27.0-wmf.9/includes/api/ApiMain.php(432): ApiMain->executeActionWithErrorHandling()
  #9 /srv/mediawiki/php-1.27.0-wmf.9/api.php(83): ApiMain->execute()
  #10 /srv/mediawiki/w/api.php(3): include(string)
  #11 {main}

What seems to be going on here is that ruwiki's revision table has PRIMARY KEY (rev_page,rev_id) rather than just rev_id as is specified by maintenance/tables.sql, which is causing issues for this query that's expecting InnoDB's index clustering to be making the rev_timestamp index effectively (rev_timestamp,rev_id) (here it's presumably (rev_timestamp,rev_page,rev_id)).

jcrespo added a subscriber: jcrespo.

At some selected servers, PRIMARY KEY (rev_page,rev_id) is created to serve recentchanges/log for logged users in a partitioned way, but those should not be used for other roles (API, for example).

I will investigate why this is the case, and if there is any issue with the roles/load balancing.

Is there another way to request the latest revisions without error?

https://en.wikipedia.org/w/api.php?action=query&list=allrevisions&format=json

{"servedby":"mw1194","error":{"code":"internal_api_error_DBQueryError","info":"[d4a40b5b] Database query error"}}

https://ru.wikipedia.org/w/api.php?action=query&list=allrevisions&format=json

{"servedby":"mw1115","error":{"code":"internal_api_error_DBQueryError","info":"[60d5b113] Database query error"}}

https://meta.wikimedia.org/w/api.php?action=query&list=allrevisions&format=json

{"servedby":"mw1117","error":{"code":"internal_api_error_DBQueryError","info":"[bde20b18] Database query error"}}

When I execute @Antigng_'s call, I get:

SELECT /*! STRAIGHT_JOIN */ rev_id,rev_page,rev_text_id,rev_timestamp,rev_comment,rev_user_text,rev_user,rev_minor_edit,rev_deleted,rev_len,rev_parent_id,rev_sha1,rev_content_format,rev_content_model,page_namespace,page_title,page_id,page_latest,page_is_redirect,page_len FROM `revision` INNER JOIN `page` ON ((rev_page = page_id)) ORDER BY rev_timestamp ,rev_id LIMIT 51

Which despite the STRAIGHT_JOIN, it gives me the following query:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: revision
         type: ALL
possible_keys: PRIMARY,page_timestamp
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 37165495
        Extra: Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: page
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: zhwiki.revision.rev_page
         rows: 1
        Extra: 
2 rows in set (0.00 sec)

Which means that it is doing a full table scan on the revision table. Ordering by rev_timestamp or rev_id alone gives me a fast query (using the index for ordering). The problem with this approach is that it can return results for pages edited at the same time in different order each time.

So the alternatives are:

  • Create an index on (rev_timestamp, rev_id)- has to be tested, but probably mariadb cannot use the extended primary key (despite being enabled) for sorting automatically
  • order by rev_id or other combination that is deterministic (rev_page, etc.)

@Krassotkin the Database is a bad place for reading *ALL* edits- this particular API call has been optimized for checking changes on a particular page, or user, etc. (with parameters). There are other calls for streaming all changes, in a faster way: RCStream https://www.mediawiki.org/wiki/API:Recent_changes_stream

  • Create an index on (rev_timestamp, rev_id)- has to be tested, but probably mariadb cannot use the extended primary key (despite being enabled) for sorting automatically

That's what's intended to be going on now implicitly, per code review on I4c9f8c0: the existing index on rev_timestamp is supposed to be combined with the primary key on rev_id thanks to InnoDB's clustering. As noted above, though, this doesn't work if the primary key is instead rev_page, rev_id, as seems to be the case on most of the ruwiki and zhwiki DBs for example.

  • order by rev_id or other combination that is deterministic (rev_page, etc.)

That would reopen T91883: prop=revisions sorts by rev_id, not by rev_timestamp.

@Anomie, so if I have understood your take well, this ticket should be blocked on T132416 OR (rev_page, rev_id) should be created explicitly, right?

Preferably the first, if it works (which may need testing because we do not want to suppose query optimization is actually doing the right thing)?

@Anomie, so if I have understood your take well, this ticket should be blocked on T132416 OR (rev_page, rev_id) should be created explicitly, right?

If you meant "(rev_timestamp, rev_id)" rather than "(rev_page, rev_id)", yes.

@Jdlrobson I do not know the details- but the original bug is a production-only problem, so most probably your problem is unrelated (database query error is a very generic error) an probably should be tracked separatelly.

Jdlrobson lowered the priority of this task from Unbreak Now! to Needs Triage.Feb 9 2017, 6:13 PM

Tracking this now in T157711

I guess this was solved when we finished: T132416?

Yes resolved from on our side.

Anomie claimed this task.

Let's close this then. If similar queries are still timing out, it'll be a different cause than was discussed for this example.

Anomie removed Anomie as the assignee of this task.Apr 6 2018, 2:13 PM