Error: 2013 Lost connection to MySQL server during query on NewFilesPager (again)
Open, HighPublic
Actions

Assigned To

None

Authored By

	jayvdb
	Aug 1 2016, 1:23 PM

Description

https://commons.wikimedia.org/wiki/Special:NewFiles?hidepatrolled=1&limit=50&offset=20160601 and https://commons.wikimedia.org/wiki/Special:NewFiles?hidepatrolled=1&limit=50&offset=20160601&dir=prev are producing the same error as
https://commons.wikimedia.org/wiki/Special:NewFiles?hidepatrolled=1&limit=50&offset= did in T124205.

Specifically, twice now consecutively I had received

A database query error has occurred. This may indicate a bug in the software.

    Function: IndexPager::buildQueryInfo (NewFilesPager)
    Error: 2013 Lost connection to MySQL server during query (10.64.16.102)

@jcrespo mentions causes and potential fixes: T141765#2511539

Related Objects
Search...

Status	Subtype	Assigned	Task
Open		None	T134802 Improve the curator workflow for reviewing new files
Open		None	T120453 Copyright violation detection tool for Commons
Resolved		None	T121870 Improve Special:NewFiles
Resolved		matthiasmullie	T121865 Allow the same filtering on Special:NewFiles as on Special:NewPages (by tags, username)
Resolved		matmarex	T124205 Error: 2013 Lost connection to MySQL server during query on NewFilesPager
Open		None	T141765 Error: 2013 Lost connection to MySQL server during query on NewFilesPager (again)
Duplicate	Feature	None	T31793 Check uploaded images with Google image search to find copyright violations
Duplicate		None	T123517 Automatically check Commons uploads for possible copyright violations
Duplicate		None	T230561 Create a ML model to score new files in commons for copyvio issues
Resolved		Samwilson	T145165 Investigation: Copyvio tools for Commons
Open		None	T132650 Copyright detection (acoustic fingerprint matching) for audio files

Event Timeline

jayvdb created this task.Aug 1 2016, 1:23 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 1 2016, 1:23 PM

10.64.16.102 is db1084. I do not see anything wrong with the server itself, although I have seen since 6:47 today triple the bytes_sent from the db server, with no significat increase in traffic (number of requests). That, however, is probably irrelevant.

This is most likely a query optimization issue:

EXPLAIN SELECT /*! STRAIGHT_JOIN */ img_name,img_user,img_timestamp FROM `image` LEFT JOIN `user_groups` ON (ug_group = 'bot' AND (ug_user = img_user)) INNER JOIN `recentchanges` ON ((rc_title = img_name) AND (rc_user = img_user) AND (rc_timestamp = img_timestamp)) WHERE (ug_group IS NULL) AND rc_type = '3' AND rc_log_type = 'upload' AND rc_patrolled = '0' AND rc_namespace = '6' AND (img_timestamp<'20160601') ORDER BY img_timestamp DESC LIMIT 51\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: image
         type: range
possible_keys: PRIMARY,img_timestamp
          key: img_timestamp
      key_len: 16
          ref: NULL
         rows: 27983996
        Extra: Using where
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: user_groups
         type: eq_ref
possible_keys: PRIMARY,ug_group
          key: PRIMARY
      key_len: 261
          ref: commonswiki.image.img_user,const
         rows: 1
        Extra: Using where; Using index; Not exists
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: ref
possible_keys: rc_timestamp,rc_namespace_title,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp
          key: rc_name_type_patrolled_timestamp
      key_len: 22
          ref: const,const,const,commonswiki.image.img_timestamp
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)

Indeed, AND (img_timestamp<'20160601') makes the query go from 0 seconds to infinity. The impact is low in numbers (not many requests/hour), but it should be fast to fix and it makes the query consistently fail, so I am marking it as high.

The reason being that results with rc_patrolled = '0' are very rare on old results, so the query ends up reading all (or a huge number) of rows in the images table.

One solution is to limit the amount of time/rows that can be queried like this, e.g. "New files with the given properties, but only withing the last 6 months/100000 last images".

SELECT /*! STRAIGHT_JOIN */ img_name,img_user,img_timestamp FROM `image` LEFT JOIN `user_groups` ON (ug_group = 'bot' AND (ug_user = img_user)) INNER JOIN `recentchanges` ON ((rc_title = img_name) AND (rc_user = img_user) AND (rc_timestamp = img_timestamp)) WHERE (ug_group IS NULL) AND rc_type = '3' AND rc_log_type = 'upload' AND rc_patrolled = '0' AND rc_namespace = '6' AND (img_timestamp<'20160601') AND (img_timestamp>'20160501') ORDER BY img_timestamp DESC LIMIT 51;

takes 3 seconds, probably still too much, but at least is finishes.

A better solution is to drop the STRAIGHT join in this case:

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: recentchanges
         type: range
possible_keys: rc_timestamp,rc_namespace_title,rc_ns_usertext,tmp_3,rc_name_type_patrolled_timestamp
          key: rc_timestamp
      key_len: 16
          ref: NULL
         rows: 1
        Extra: Using index condition; Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: image
         type: eq_ref
possible_keys: PRIMARY,img_timestamp
          key: PRIMARY
      key_len: 257
          ref: commonswiki.recentchanges.rc_title
         rows: 1
        Extra: Using where
*************************** 3. row ***************************
           id: 1
  select_type: SIMPLE
        table: user_groups
         type: eq_ref
possible_keys: PRIMARY,ug_group
          key: PRIMARY
      key_len: 261
          ref: commonswiki.recentchanges.rc_user,const
         rows: 1
        Extra: Using where; Using index; Not exists
3 rows in set (0.00 sec)

This takes more time to order, but it returns in 0 seconds.

A third option is to disallow hidepatrolled=1 + offset=* - you can hide patrolled edits on the latest changes, but not on old ones that would be mostly all patrolled.

jcrespo added a subscriber: Anomie.Aug 1 2016, 2:07 PM

Josve05a removed a project: User-Josve05a.Aug 1 2016, 2:11 PM

A third option is to disallow hidepatrolled=1 + offset=* - you can hide patrolled edits on the latest changes, but not on old ones that would be mostly all patrolled.

It is a common usage pattern to patrol from the back of the queue, to pick up the ones that have slipped through the cracks of patrollers working on the front of the queue.

I'll note that there will not even be any results older than $wgRCMaxAge seconds, because recentchanges table entries are deleted after this time. I think this defaults to 30 days in WMF config.

In T141765#2511546, @jayvdb wrote:

A third option is to disallow hidepatrolled=1 + offset=* - you can hide patrolled edits on the latest changes, but not on old ones that would be mostly all patrolled.

It is a common usage pattern to patrol from the back of the queue, to pick up the ones that have slipped through the cracks of patrollers working on the front of the queue.

It is also how paging is implemented, so definitely not an option.

I want to hear @Anomie's take on this, he is aware of many issues regarding STRAIGHT_JOINs on mediawiki (when it is needed and when it is not, special cases found, etc.).

In T141765#2511547, @matmarex wrote:

I'll note that there will not even be any results older than $wgRCMaxAge seconds, because recentchanges table entries are deleted after this time. I think this defaults to 30 days in WMF config.

Indeed. I was using &dir=prev initially, but forgot to mention that.

jcrespo moved this task from Triage to Blocked external/Not db team on the DBA board.Aug 1 2016, 3:04 PM

jcrespo updated the task description. (Show Details)

I'm on vacation at the moment, but off the top of my head I note the straight join here was just added for T124205: Error: 2013 Lost connection to MySQL server during query on NewFilesPager.

The thing to do might be to add a more targeted index (maybe rc_namespace,rc_timestamp or rc_type,rc_namespace,rc_timestamp, or maybe something even more targeted) to cut down the number of matching recentchanges rows, change it to order by rc_timestamp, and switch the table order so straight-join always does recentchanges first. That should at least fix both of these bugs.

Josve05a awarded a token.Aug 5 2016, 1:38 AM

jayvdb mentioned this in T142818: Assess pywikibot-catfiles MVP.Aug 12 2016, 9:44 AM

Krinkle moved this task from Untriaged to Usage problem on the MediaWiki-libs-Rdbms board.May 8 2017, 1:27 AM

After all the MySQL upgrades and all that, this bug is still valid and the two reported URLs are still failing.

Restricted Application added a project: Growth-Team. · View Herald TranscriptSep 7 2018, 6:33 PM

JTannerWMF moved this task from Inbox to Needs Discussion on the Growth-Team board.Sep 9 2018, 7:55 AM

Liuxinyu970226 awarded a token.Sep 14 2018, 12:52 PM

Liuxinyu970226 subscribed.

Growth-Team discussed in triage this week and we won't have time to prioritize this in the near-to-medium term.

• Marostegui removed a project: DBA.Dec 7 2018, 2:25 PM

Krinkle edited projects, added Wikimedia-database-issue, MediaWiki-File-management; removed MediaWiki-libs-Rdbms.Jul 18 2019, 8:03 PM

Restricted Application added projects: Commons, Multimedia. · View Herald TranscriptJul 18 2019, 8:03 PM

Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:01 PM

MBinder_WMF added a project: Growth-Team-Filtering.Apr 15 2021, 7:00 PM

Krinkle edited projects, added Wikimedia-Slow-DB-Query; removed Wikimedia-database-issue.Apr 9 2022, 8:34 PM

Error: 2013 Lost connection to MySQL server during query on NewFilesPager (again)Open, HighPublicActions

Description

Related ObjectsSearch...

Event Timeline

Error: 2013 Lost connection to MySQL server during query on NewFilesPager (again)
Open, HighPublic
Actions

Related Objects
Search...