Page MenuHomePhabricator

Full table scan query on wikishared
Closed, ResolvedPublic

Assigned To
Authored By
Marostegui
Jan 6 2025, 1:37 PM
Referenced Files
F58134165: image.png
Jan 6 2025, 2:21 PM
F58134111: image.png
Jan 6 2025, 2:13 PM
F58134104: image.png
Jan 6 2025, 2:13 PM
F58134102: image.png
Jan 6 2025, 2:13 PM
F58134084: Captura de pantalla 2025-01-06 a las 15.06.56.png
Jan 6 2025, 2:07 PM
F58133993: image.png
Jan 6 2025, 1:38 PM

Description

This query, which runs very often on the wikishared database in x1, is scanning 3M rows each time as it is not using an index.

mysql:root@localhost [wikishared]> explain SELECT /* ContentTranslation\Store\TranslationStore::findByPublishedTitle  */  *  FROM `cx_translations`    WHERE translation_target_language = 'es' AND translation_target_title = 'Copa Mundial de Clubes de la FIFA 2025' AND ((translation_status = 'published' OR translation_target_url IS NOT NULL))  LIMIT 1;;
+------+-------------+-----------------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table           | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-----------------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | cx_translations | ALL  | NULL          | NULL | NULL    | NULL | 2662363 | Using where |
+------+-------------+-----------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.001 sec)

The query itself isn't super slow (around 3 seconds) but we should take a look and see if we can avoid all this full scans as if the table keeps growing, it will keep being slower.

Because of this, Currently, rows read in x1 is higher than every other section in production combined:

image.png (921×1 px, 702 KB)

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
add_cx_translation_target_title_T383052.py: New indexrepos/sre/schema-changes!30marosteguiT383052main
Customize query in GitLab

Event Timeline

Ladsgroup triaged this task as Unbreak Now! priority.Jan 6 2025, 1:44 PM
Ladsgroup added a subscriber: Nikerabbit.

The cx_translation_target_title index from T351999 (gerrit change) looks like it should help, but appears to be missing in production?

mysql:research@dbstore1009.eqiad.wmnet [wikishared]> SHOW INDEX FROM cx_translations;
+-----------------+------------+--------------------------+--------------+-----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table           | Non_unique | Key_name                 | Seq_in_index | Column_name                 | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+-----------------+------------+--------------------------+--------------+-----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| cx_translations |          0 | PRIMARY                  |            1 | translation_id              | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            1 | translation_source_title    | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            2 | translation_source_language | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            3 | translation_target_language | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            4 | translation_started_by      | A         |     2796423 |     NULL | NULL   | YES  | BTREE      |         |               | NO      |
| cx_translations |          1 | cx_translation_languages |            1 | translation_source_language | A         |         570 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          1 | cx_translation_languages |            2 | translation_target_language | A         |       13509 |     NULL | NULL   |      | BTREE      |         |               | NO      |
+-----------------+------------+--------------------------+--------------+-----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
7 rows in set (0.001 sec)

(This is the analytics replicas – I’m not sure how to access x1/wikishared for the real production DBs. Edit: sql wikishared on deploy2002 shows basically the same result except for slightly different Cardinality numbers.)

The cx_translation_target_title index from T351999 (gerrit change) looks like it should help, but appears to be missing in production?

mysql:research@dbstore1009.eqiad.wmnet [wikishared]> SHOW INDEX FROM cx_translations;
+-----------------+------------+--------------------------+--------------+-----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| Table           | Non_unique | Key_name                 | Seq_in_index | Column_name                 | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Ignored |
+-----------------+------------+--------------------------+--------------+-----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
| cx_translations |          0 | PRIMARY                  |            1 | translation_id              | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            1 | translation_source_title    | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            2 | translation_source_language | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            3 | translation_target_language | A         |     2796423 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          0 | cx_translation_ref       |            4 | translation_started_by      | A         |     2796423 |     NULL | NULL   | YES  | BTREE      |         |               | NO      |
| cx_translations |          1 | cx_translation_languages |            1 | translation_source_language | A         |         570 |     NULL | NULL   |      | BTREE      |         |               | NO      |
| cx_translations |          1 | cx_translation_languages |            2 | translation_target_language | A         |       13509 |     NULL | NULL   |      | BTREE      |         |               | NO      |
+-----------------+------------+--------------------------+--------------+-----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+
7 rows in set (0.001 sec)

(This is the analytics replicas – I’m not sure how to access x1/wikishared for the real production DBs.)

Do you know if there was a Schema-change-in-production task to add that index? I can't find it.

That index isn't in production:

cumin2024@db1179.eqiad.wmnet[wikishared]> show create table cx_translations;
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------>
| Table           | Create Table                                                                                                                                                                                                                                               >
+-----------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------>
| cx_translations | CREATE TABLE `cx_translations` (
  `translation_id` int(11) NOT NULL AUTO_INCREMENT,
  `translation_source_title` varbinary(512) NOT NULL,
  `translation_target_title` varbinary(512) NOT NULL,
  `translation_source_language` varbinary(36) NOT NULL,
  `translation_target_language` varbinary(36) NOT NULL,
  `translation_source_url` blob NOT NULL,
  `translation_target_url` blob DEFAULT NULL,
  `translation_status` enum('draft','published','deleted') DEFAULT NULL,
  `translation_start_timestamp` binary(14) NOT NULL,
  `translation_last_updated_timestamp` binary(14) NOT NULL,
  `translation_progress` tinyblob NOT NULL,
  `translation_started_by` int(11) DEFAULT NULL,
  `translation_last_update_by` int(11) DEFAULT NULL,
  `translation_source_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_target_revision_id` int(10) unsigned DEFAULT NULL,
  `translation_cx_version` tinyint(3) unsigned DEFAULT 1,
  PRIMARY KEY (`translation_id`),
  UNIQUE KEY `cx_translation_ref` (`translation_source_title`,`translation_source_language`,`translation_target_language`,`translation_started_by`),
  KEY `cx_translation_languages` (`translation_source_language`,`translation_target_language`)
) ENGINE=InnoDB AUTO_INCREMENT=2887303 DEFAULT CHARSET=binary

Do you know if there was a Schema-change-in-production task to add that index? I can't find it.

I guess it was probably missed (or the schema change was considered not ready yet, as the task is still open?). I don’t know anything about this extension, I just looked at the SQL files in Git to see if there was anything relevant-looking in there.

Do you know if there was a Schema-change-in-production task to add that index? I can't find it.

I guess it was probably missed (or the schema change was considered not ready yet, as the task is still open?). I don’t know anything about this extension, I just looked at the SQL files in Git to see if there was anything relevant-looking in there.

It is strange, as if the change is merged (which is) we normally go ahead with it (if there's a task of course - otherwise we are not even aware). I am going to add the index and see if it helps and if it does, we can go for it.

We probably can run the alter table with replication. Given that it's RBR.

It does!

cumin2024@dbstore1009.eqiad.wmnet[wikishared]> explain SELECT /* ContentTranslation\Store\TranslationStore::findByPublishedTitle  */  *  FROM `cx_translations`    WHERE translation_target_language = 'es' AND translation_target_title = 'Copa Mundial de Clubes de la FIFA 2025' AND ((translation_status = 'published' OR translation_target_url IS NOT NULL))  LIMIT 1;
+------+-------------+-----------------+------+---------------+------+---------+------+---------+-------------+
| id   | select_type | table           | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+------+-------------+-----------------+------+---------------+------+---------+------+---------+-------------+
|    1 | SIMPLE      | cx_translations | ALL  | NULL          | NULL | NULL    | NULL | 2796437 | Using where |
+------+-------------+-----------------+------+---------------+------+---------+------+---------+-------------+
1 row in set (0.003 sec)

cumin2024@dbstore1009.eqiad.wmnet[wikishared]> CREATE INDEX cx_translation_target_title ON /*_*/cx_translations (translation_target_title);
Query OK, 0 rows affected (7.518 sec)
Records: 0  Duplicates: 0  Warnings: 0

cumin2024@dbstore1009.eqiad.wmnet[wikishared]> explain SELECT /* ContentTranslation\Store\TranslationStore::findByPublishedTitle  */  *  FROM `cx_translations`    WHERE translation_target_language = 'es' AND translation_target_title = 'Copa Mundial de Clubes de la FIFA 2025' AND ((translation_status = 'published' OR translation_target_url IS NOT NULL))  LIMIT 1;
+------+-------------+-----------------+------+-----------------------------+-----------------------------+---------+-------+------+------------------------------------+
| id   | select_type | table           | type | possible_keys               | key                         | key_len | ref   | rows | Extra                              |
+------+-------------+-----------------+------+-----------------------------+-----------------------------+---------+-------+------+------------------------------------+
|    1 | SIMPLE      | cx_translations | ref  | cx_translation_target_title | cx_translation_target_title | 514     | const | 1    | Using index condition; Using where |
+------+-------------+-----------------+------+-----------------------------+-----------------------------+---------+-------+------+------------------------------------+
1 row in set (0.001 sec)

cumin2024@dbstore1009.eqiad.wmnet[wikishared]> SELECT /* ContentTranslation\Store\TranslationStore::findByPublishedTitle  */  *  FROM `cx_translations`    WHERE translation_target_language = 'es' AND translation_target_title = 'Copa Mundial de Clubes de la FIFA 2025' AND ((translation_status = 'published' OR translation_target_url IS NOT NULL))  LIMIT 1;
Empty set (0.002 sec)
Marostegui added a project: DBA.

We probably can run the alter table with replication. Given that it's RBR.

I think I can run it manually just to be safe, there are not many hosts, to be on the safe side. I will send the gitlab merge just to keep track of this.

Mentioned in SAL (#wikimedia-operations) [2025-01-06T14:04:15Z] <marostegui> Deploy schema change on x1 dbmaint eqiad T383052

Mentioned in SAL (#wikimedia-operations) [2025-01-06T14:10:09Z] <marostegui> Deploy schema change on x1 dbmaint codfw T383052

This has been deployed everywhere.

The impact on production:

image.png (921×1 px, 284 KB)