Page MenuHomePhabricator

Frequent 504s while using logevents API on Commons
Closed, ResolvedPublic

Description

I and others have observed frequent 504 Gateway Timeout errors from the MediaWiki API on Commons while trying to use VisualFileChange. The first report was around 2021-05-06 09:30 UTC in #wikimedia-commons and at https://commons.wikimedia.org/wiki/MediaWiki_talk:Gadget-VisualFileChange.js#not_loading_with_user's_contributions_as_a_generator.

Example request:

https://commons.wikimedia.org/w/api.php
{
	"format": "json",
	"action": "query",
	"list": "logevents",
	"rawcontinue": "1",
	"leprop": "title|timestamp|type",
	"leaction": "upload/upload",
	"leuser": "Imsamrat392",
	"lelimit": "50",
	"ledir": "older",
	"lecontinue": "20201023131429|300687794"
}

https://commons.wikimedia.org/w/api.php?format=json&action=query&list=logevents&rawcontinue=1&leprop=title%7Ctimestamp%7Ctype&leaction=upload%2Fupload&leuser=Imsamrat392&lelimit=50&ledir=older&lecontinue=20201023131429%7C300687794

POST /w/api.php HTTP/2
Host: commons.wikimedia.org
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0
Accept: application/json, text/javascript, */*; q=0.01
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Referer: https://commons.wikimedia.org/wiki/Special:ListFiles/Imsamrat392
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Content-Length: 190
Origin: https://commons.wikimedia.org
Connection: keep-alive
Cookie: ***
DNT: 1
Sec-GPC: 1
Pragma: no-cache
Cache-Control: no-cache
TE: Trailers

HTTP/2 504 Gateway Timeout
content-length: 24
content-type: text/plain
date: Thu, 06 May 2021 13:05:21 GMT
server: envoy
age: 65
x-cache: cp1085 miss, cp1079 pass
x-cache-status: pass
Error message from Logstash
2021-05-06 22:31:03 [caef8fe6-1da3-4c11-a8ea-e0553978eec0] mw1281 commonswiki 1.37.0-wmf.4 exception ERROR: [caef8fe6-1da3-4c11-a8ea-e0553978eec0] /w/api.php   Wikimedia\Rdbms\DBQueryDisconnectedError: A connection error occurred during a query.
Query: SELECT  /*! STRAIGHT_JOIN */ log_id,log_type,log_action,log_timestamp,log_deleted,log_namespace,log_title  FROM `logging` IGNORE INDEX (times) JOIN `actor` ON ((actor_id=log_actor)) LEFT JOIN `page` ON ((log_namespace=page_namespace) AND (log_title=page_title))   WHERE (log_type != 'urlshortener') AND log_type = 'upload' AND log_action = 'upload' AND actor_name = '…'  ORDER BY log_timestamp DESC,log_id DESC LIMIT 31
Function: ApiQueryLogEvents::execute
Error: 2006 MySQL server has gone away (10.64.0.220)

  #0 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/database/Database.php(1722): Wikimedia\Rdbms\Database->getQueryException(string, integer, string, string)
  #1 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/database/Database.php(1697): Wikimedia\Rdbms\Database->getQueryExceptionAndLog(string, integer, string, string)
  #2 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/database/Database.php(1260): Wikimedia\Rdbms\Database->reportQueryError(string, integer, string, string, boolean)
  #3 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/database/Database.php(1948): Wikimedia\Rdbms\Database->query(string, string, integer)
  #4 /srv/mediawiki/php-1.37.0-wmf.4/includes/libs/rdbms/querybuilder/SelectQueryBuilder.php(614): Wikimedia\Rdbms\Database->select(array, array, array, string, array, array)
  #5 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiQueryBase.php(427): Wikimedia\Rdbms\SelectQueryBuilder->fetchResultSet()
  #6 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php(253): ApiQueryBase->select(string)
  #7 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiQuery.php(326): ApiQueryLogEvents->execute()
  #8 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiMain.php(1668): ApiQuery->execute()
  #9 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiMain.php(638): ApiMain->executeAction()
  #10 /srv/mediawiki/php-1.37.0-wmf.4/includes/api/ApiMain.php(609): ApiMain->executeActionWithErrorHandling()
  #11 /srv/mediawiki/php-1.37.0-wmf.4/api.php(90): ApiMain->execute()
  #12 /srv/mediawiki/php-1.37.0-wmf.4/api.php(45): wfApiMain()
  #13 /srv/mediawiki/w/api.php(3): require(string)
  #14 {main}

Event Timeline

Rubin16 triaged this task as High priority.May 6 2021, 1:46 PM
AntiCompositeNumber raised the priority of this task from High to Needs Triage.May 6 2021, 7:01 PM

Please don't set priorities on tasks unless you plan to work on them.

Urbanecm added subscribers: BBlack, Urbanecm.

I can see the change in 504 trends at commonswiki (link):

image.png (491×1 px, 77 KB)

From SAL, those are entries from around 18:00 UTC on May 5:

17:59 <bblack@cumin1001> conftool action : set/weight=100; selector: name=cp501[3456].eqsin.wmnet,service=ats-be
17:59 <bblack@cumin1001> conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=ats-tls
17:59 <bblack@cumin1001> 
conftool action : set/weight=1; selector: name=cp501[3456].eqsin.wmnet,service=varnish-fe

cc @BBlack and tagging Traffic.

AntiCompositeNumber triaged this task as Unbreak Now! priority.May 7 2021, 12:03 AM
Krinkle lowered the priority of this task from Unbreak Now! to Needs Triage.May 7 2021, 12:04 AM
Krinkle triaged this task as Unbreak Now! priority.
Krinkle updated the task description. (Show Details)

The problem is the STRAIGHT_JOIN:

MariaDB [commonswiki]> explain SELECT  /*! STRAIGHT_JOIN */ log_id,log_type,log_action,log_timestamp,log_deleted,log_namespace,log_title  FROM `logging` IGNORE INDEX (times) JOIN `actor` ON ((actor_id=log_actor)) LEFT JOIN `page` ON ((log_namespace=page_namespace) AND (log_title=page_title))   WHERE (log_type != 'urlshortener') AND log_type = 'upload' AND log_action = 'upload' AND actor_name = 'Veronica111a'  ORDER BY log_timestamp DESC,log_id DESC LIMIT 31\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: logging
         type: ref
possible_keys: type_time,actor_time,log_actor_type_time,log_type_action
          key: type_time
      key_len: 34
          ref: const
         rows: 59282459
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor
         type: const
possible_keys: PRIMARY,actor_name
          key: actor_name
      key_len: 257
          ref: const
         rows: 1
        Extra: Using where; Using index
2 rows in set (0.001 sec)

MariaDB [commonswiki]> explain SELECT log_id,log_type,log_action,log_timestamp,log_deleted,log_namespace,log_title  FROM `logging` IGNORE INDEX (times) JOIN `actor` ON ((actor_id=log_actor)) LEFT JOIN `page` ON ((log_namespace=page_namespace) AND (log_title=page_title))   WHERE (log_type != 'urlshortener') AND log_type = 'upload' AND log_action = 'upload' AND actor_name = 'Veronica111a'  ORDER BY log_timestamp DESC,log_id DESC LIMIT 31\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: actor
         type: const
possible_keys: PRIMARY,actor_name
          key: actor_name
      key_len: 257
          ref: const
         rows: 1
        Extra: Using index
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: logging
         type: ref
possible_keys: type_time,actor_time,log_actor_type_time,log_type_action
          key: log_actor_type_time
      key_len: 42
          ref: const,const
         rows: 2
        Extra: Using where
2 rows in set (0.001 sec)

Change 686110 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@master] ApiQueryLogEvents: when user is specified, omit STRAIGHT_JOIN

https://gerrit.wikimedia.org/r/686110

Change 685896 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@wmf/1.37.0-wmf.3] ApiQueryLogEvents: when user is specified, omit STRAIGHT_JOIN

https://gerrit.wikimedia.org/r/685896

Change 685897 had a related patch set uploaded (by Tim Starling; author: Tim Starling):

[mediawiki/core@wmf/1.37.0-wmf.4] ApiQueryLogEvents: when user is specified, omit STRAIGHT_JOIN

https://gerrit.wikimedia.org/r/685897

Change 686110 merged by jenkins-bot:

[mediawiki/core@master] ApiQueryLogEvents: when user is specified, omit STRAIGHT_JOIN

https://gerrit.wikimedia.org/r/686110

Change 685896 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.3] ApiQueryLogEvents: when user is specified, omit STRAIGHT_JOIN

https://gerrit.wikimedia.org/r/685896

Change 685897 merged by jenkins-bot:

[mediawiki/core@wmf/1.37.0-wmf.4] ApiQueryLogEvents: when user is specified, omit STRAIGHT_JOIN

https://gerrit.wikimedia.org/r/685897

Mentioned in SAL (#wikimedia-operations) [2021-05-07T06:09:11Z] <tstarling@deploy1002> Synchronized php-1.37.0-wmf.3/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 06s)

Mentioned in SAL (#wikimedia-operations) [2021-05-07T06:11:01Z] <tstarling@deploy1002> Synchronized php-1.37.0-wmf.4/includes/api/ApiQueryLogEvents.php: fix UBN T282122 (duration: 01m 10s)

tstarling claimed this task.

Should be fixed now.