Page MenuHomePhabricator

Queries of commonswiki_p.filearchive for fa_sha1 are slow
Open, MediumPublic

Description

Cloned from bug #57697, comment #10:

MariaDB [commonswiki_p]> SELECT * FROM image WHERE img_sha1 = 'qtexhtbcwt0tnkuxb2wf3xs7d7j761u' LIMIT 0, 1000;
+-----------------------------+----------+-----------+------------+------------------------------------------+----------+----------------+----------------+----------------+------------------------------------------------------+----------+---------------+----------------+---------------------------------+
| img_name                    | img_size | img_width | img_height | img_metadata                             | img_bits | img_media_type | img_major_mime | img_minor_mime | img_description                                      | img_user | img_user_text | img_timestamp  | img_sha1                        |
+-----------------------------+----------+-----------+------------+------------------------------------------+----------+----------------+----------------+----------------+------------------------------------------------------+----------+---------------+----------------+---------------------------------+
| !!!!古市公威像0103.JPG      |   492465 |      1536 |       1024 | a:1:{s:22:"MEDIAWIKI_EXIF_VERSION";i:2;} |        8 | BITMAP         | image          | jpeg           | Ich machte dieses Foto von mir in 2006.{{GFDL-self}} |    81053 | Tokyo Watcher | 20070305091405 | qtexhtbcwt0tnkuxb2wf3xs7d7j761u |
+-----------------------------+----------+-----------+------------+------------------------------------------+----------+----------------+----------------+----------------+------------------------------------------------------+----------+---------------+----------------+---------------------------------+
1 row in set (0.00 sec)

MariaDB [commonswiki_p]> SELECT * FROM oldimage WHERE oi_sha1 = '0mpoldytyxspxrdbf44r1kc7m8vtbq67' LIMIT 0, 1000;
Empty set (0.00 sec)

MariaDB [commonswiki_p]> SELECT * FROM filearchive WHERE fa_sha1 = '0mpoldytyxspxrdbf44r1kc7m8vtbq67' LIMIT 0, 1000;
+-------+-----------------------+-----------------+------------------+----------------+-----------------+----------------------+-------------------+---------+----------+-----------+-------------+---------+---------------+---------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------+----------------+------------+----------------------------------+
| fa_id | fa_name               | fa_archive_name | fa_storage_group | fa_storage_key | fa_deleted_user | fa_deleted_timestamp | fa_deleted_reason | fa_size | fa_width | fa_height | fa_metadata | fa_bits | fa_media_type | fa_major_mime | fa_minor_mime | fa_description                                                                                                                                                      | fa_user | fa_user_text | fa_timestamp   | fa_deleted | fa_sha1                          |
+-------+-----------------------+-----------------+------------------+----------------+-----------------+----------------------+-------------------+---------+----------+-----------+-------------+---------+---------------+---------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------+----------------+------------+----------------------------------+
|     8 | Perreo-logo-black.jpg | NULL            | deleted          |           NULL |             123 | 20060616034833       | logo              |   26447 |      400 |       194 | 0           |       8 | BITMAP        | image         | jpeg          | {{Information
|Description= Logo for PerreoRadio.com
|Source= http://www.PerreoRadio.com
|Date= 6/12/2006
|Author= PerreoRadio.com
|Permission=
|other_versions=
}} |   47161 | Ameno13      | 20060612221349 |          0 | 0mpoldytyxspxrdbf44r1kc7m8vtbq67 |
+-------+-----------------------+-----------------+------------------+----------------+-----------------+----------------------+-------------------+---------+----------+-----------+-------------+---------+---------------+---------------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------+----------------+------------+----------------------------------+
1 row in set (4.94 sec)

MariaDB [commonswiki_p]>

Version: unspecified
Severity: normal

Details

Reference
bz69088

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:33 AM
bzimport added a project: Cloud-VPS.
bzimport set Reference to bz69088.
scfc created this task.Aug 4 2014, 1:35 AM
Krenair added a subscriber: Krenair.
jcrespo triaged this task as Low priority.Jun 18 2015, 10:19 AM
jcrespo moved this task from Triage to Backlog on the DBA board.
jcrespo raised the priority of this task from Low to Needs Triage.Jun 18 2015, 10:23 AM
jcrespo added a subscriber: jcrespo.
jcrespo triaged this task as Medium priority.Jun 26 2015, 11:21 AM

I can confirm the issue:

MariaDB LABS localhost commonswiki_p > SELECT * FROM filearchive WHERE fa_sha1 = '0mpoldytyxspxrdbf44r1kc7m8vtbq67' LIMIT 0, 1000;
1 row in set (24.45 sec)
Restricted Application added subscribers: Steinsplitter, Aklapper. · View Herald TranscriptJul 16 2015, 9:58 PM
Volans added a subscriber: Volans.May 30 2016, 11:24 AM

The query is not using any index:

MariaDB LABS localhost (none) > explain SELECT * FROM commonswiki_p.filearchive WHERE fa_sha1 = '0mpoldytyxspxrdbf44r1kc7m8vtbq67' LIMIT 0, 1000;
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table       | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | filearchive | ALL  | NULL          | NULL | NULL    | NULL | 3736798 | Using where |
+------+-------------+-------------+------+---------------+------+---------+------+---------+-------------+

While the same query on the original table is using it:

MariaDB LABS localhost (none) > explain SELECT * FROM commonswiki.filearchive WHERE fa_sha1 = '0mpoldytyxspxrdbf44r1kc7m8vtbq67' LIMIT 0, 1000;
+------+-------------+-------------+------+---------------+---------+---------+-------+------+-------------+
| id   | select_type | table       | type | possible_keys | key     | key_len | ref   | rows | Extra       |
+------+-------------+-------------+------+---------------+---------+---------+-------+------+-------------+
|    1 | SIMPLE      | filearchive | ref  | fa_sha1       | fa_sha1 | 12      | const |    1 | Using where |
+------+-------------+-------------+------+---------------+---------+---------+-------+------+-------------+

CREATE VIEW

The CREATE VIEW for the sha1 field is different in the 3 mentioned tables, while the other two have a direct mapping of the sha1 field, the filearchive has a condition that I think doesn't allow MariaDB to use the underlying index for this kind of query:

if((`commonswiki`.`filearchive`.`fa_deleted` & 1),NULL,`commonswiki`.`filearchive`.`fa_sha1`) AS `fa_sha1`

INDEX

In addition the index on on fa_sha1 in the original table commonswiki.filearchive is limited to a length of 10 (KEY fa_sha1 (fa_sha1(10))) and is different from the other two tables mentioned above. But it should still be used for this kind of query on the VIEW like it is on the original table, if the condition wasn't there.

One option might be to create a filearchive_notdeleted view (analogous to the _userindex one), with a WHERE fa_deleted&1 = 0.

(https://git.wikimedia.org/blob/operations%2Fsoftware/master/maintain-replicas%2Fmaintain-replicas.pl#L151)

jcrespo updated the task description. (Show Details)May 30 2016, 11:55 AM
jcrespo set Security to None.
Aklapper removed Springle as the assignee of this task.Jul 10 2017, 8:09 AM
Aklapper added a subscriber: Springle.

[ Resetting assignee as assignee account is not active anymore ]

bd808 moved this task from Backlog to Wiki replicas on the Data-Services board.Oct 26 2017, 3:31 PM
Don-vip added a project: Tool-spacemedia.
Don-vip added a comment.EditedAug 17 2019, 10:40 AM

Is there any way to quickly detect if a SHA-1 has been previously deleted? It's currently possible through the view replica, but it's painfully slow (I guess this is a lot worse today than 5 years ago). I see with T60993 that this is not possible using the API. Is there a workaround? I don't need any details from the filearchive table, just to know if a SHA-1 is in there. (currently working on this).

Thanks! It indeed requires deletedhistory (the documentation is incomplete on this, I have created T230686).
How can I request this right on Commons API? Through a new Phabricator ticket?