Page MenuHomePhabricator

Clean up indexes of wb_terms table
Closed, DeclinedPublic

Description

In production tables for wikidatawiki (and testwikidatawiki) these three indexes should stay and everything else must go:

test_user@db2083.codfw.wmnet(wikidatawiki)>show index from wb_terms;
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY          |            1 | term_row_id         | A         |  1585960379 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text    |            1 | term_text           | A         |   528653459 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity |            1 | term_full_entity_id | A         |   144178216 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.03 sec)

This means dropping index both from the code base and production (they have diverged for reasons that is unknown to me)

Details

Related Gerrit Patches:
mediawiki/extensions/Wikibase : masterDrop search-related indexes when not needed
operations/mediawiki-config : masterdb-eqiad.php: Depool db1087
operations/mediawiki-config : masterdb-eqiad.php: Depool db1099:3318
operations/mediawiki-config : masterdb-eqiad.php: Depool db1101:3318
operations/mediawiki-config : masterdb-eqiad.php: Depool db1092
operations/puppet : productionmediawiki: Stop replacing term_search_key with empty string
operations/mediawiki-config : masterdb-eqiad.php: Repool db1092
operations/mediawiki-config : masterMake clients read from term_text instead of term_search_key
operations/mediawiki-config : masterdb-eqiad.php: Depool db1092
operations/mediawiki-config : masterdb-eqiad.php: Depool db1104
operations/mediawiki-config : masterdb-eqiad.php: Depool db1092
mediawiki/extensions/Wikibase : masterDrop term_entity_id indexes

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 433674 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Drop term_entity_id indexes

https://gerrit.wikimedia.org/r/433674

Change 433674 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Drop term_entity_id indexes

https://gerrit.wikimedia.org/r/433674

Change 433747 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[mediawiki/extensions/Wikibase@master] Drop search-related indexes on wb_terms when backend of search is different

https://gerrit.wikimedia.org/r/433747

Marostegui moved this task from Next to In progress on the DBA board.

I need to deploy other schema changes on s8, so I will include this as it is a pretty straightforward one and has been merged already.

Marostegui added a comment.EditedMay 22 2018, 5:48 AM

@Ladsgroup can you confirm which indexes we have to drop? There are two different patches and the syntax doesn't match what we have in production:

These are the patches:

DROP INDEX /*i*/term_search_key ON /*_*/wb_terms;
DROP INDEX /*i*/term_search_full ON /*_*/wb_terms;

DROP INDEX /*i*/term_search ON /*_*/wb_terms;
DROP INDEX /*i*/term_entity ON /*_*/wb_terms;

These are the ones we have in core.

->   KEY `wb_terms_entity_id` (`term_entity_id`),
->   KEY `wb_terms_text` (`term_text`),
->   KEY `wb_terms_search_key` (`term_search_key`),
->   KEY `tmp1` (`term_language`,`term_type`,`term_entity_type`,`term_search_key`),
->   KEY `term_full_entity` (`term_full_entity_id`),
->   KEY `term_search_full` (`term_language`,`term_full_entity_id`,`term_type`,`term_search_key`(16))

Can you confirm which ones we have to delete as part of this task? (I know we are handling tmp1 on a different task).

This two index should stay and everything else needs to go (T191391#4191406)

test_user@db2083.codfw.wmnet(wikidatawiki)>show index from wb_terms;
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY          |            1 | term_row_id         | A         |  1585960379 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text    |            1 | term_text           | A         |   528653459 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity |            1 | term_full_entity_id | A         |   144178216 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.03 sec)

Thanks for the clarification!
At some point, it would be nice to update the indexes names on Wikibase.sql as on core the index is called wb_terms_text and on the file it is called terms_text

Mentioned in SAL (#wikimedia-operations) [2018-05-23T05:12:05Z] <marostegui> Deploy schema change on s8 codfw primary master (db2045), this will generate lag on codfw - T194273

Marostegui added a comment.EditedMay 23 2018, 5:17 AM

Deletion progress:

  • codfw

eqiad:

  • dbstore1002
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • db1095
  • db1109
  • db1104
  • db1116
  • db1101
  • db1099
  • db1092
  • db1087
  • db1071

Change 434624 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/434624

Change 434624 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/434624

Mentioned in SAL (#wikimedia-operations) [2018-05-23T05:43:48Z] <marostegui> Deploy schema change on db1092 - T191519 T188299 T190148 T194273 T194270

This is how the table looks like after removing all the indexes that are not needed:

       Table: wb_terms
Create Table: CREATE TABLE `wb_terms` (
  `term_row_id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `term_entity_id` int(10) unsigned NOT NULL,
  `term_full_entity_id` varbinary(32) DEFAULT NULL,
  `term_entity_type` varbinary(32) NOT NULL,
  `term_language` varbinary(32) NOT NULL,
  `term_type` varbinary(32) NOT NULL,
  `term_text` varbinary(255) NOT NULL,
  `term_search_key` varbinary(255) NOT NULL,
  `term_weight` float unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`term_row_id`),
  KEY `wb_terms_text` (`term_text`),
  KEY `term_full_entity` (`term_full_entity_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2259289467 DEFAULT CHARSET=binary

Change 434631 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/434631

Marostegui added a comment.EditedMay 23 2018, 6:30 AM

@Ladsgroup I am not sure we should keep on with this task - looks like removing wb_terms_search_key isn't a good idea:
Look at this:

db1092 https://grafana.wikimedia.org/dashboard/db/mysql?panelId=3&fullscreen&orgId=1&from=1527056093345&to=1527056993345&var-dc=eqiad%20prometheus%2Fops&var-server=db1092&var-port=9104

root@db1092[wikidatawiki]> show indexes from wb_terms;
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY          |            1 | term_row_id         | A         |  1652608049 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text    |            1 | term_text           | A         |   275434674 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity |            1 | term_full_entity_id | A         |   118043432 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

root@db1092[wikidatawiki]> explain SELECT /* Wikibase\Lib\Store\Sql\TermSqlIndex::getMatchingTerms  */  term_entity_type,term_type,term_language,term_text,term_weight,term_full_entity_id  FROM `wb_terms`    WHERE ((term_language = 'bn' AND term_search_key = '' AND term_type = 'label' AND term_entity_type = 'item') OR (term_language = 'bn' AND term_search_key = '' AND term_type = 'alias' AND term_entity_type = 'item'))  LIMIT 2500;
+------+-------------+----------+------+---------------+------+---------+------+------------+-------------+
| id   | select_type | table    | type | possible_keys | key  | key_len | ref  | rows       | Extra       |
+------+-------------+----------+------+---------------+------+---------+------+------------+-------------+
|    1 | SIMPLE      | wb_terms | ALL  | NULL          | NULL | NULL    | NULL | 1652608451 | Using where |
+------+-------------+----------+------+---------------+------+---------+------+------------+-------------+
1 row in set (0.00 sec)

db1104: https://grafana.wikimedia.org/dashboard/db/mysql?panelId=3&fullscreen&orgId=1&from=1527056062121&to=1527056962121&var-dc=eqiad%20prometheus%2Fops&var-server=db1104&var-port=9104

root@db1104[wikidatawiki]> show indexes from wb_terms;
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name            | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY             |            1 | term_row_id         | A         |  1683951612 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_entity_id  |            1 | term_entity_id      | A         |    99055977 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text       |            1 | term_text           | A         |   240564516 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_search_key |            1 | term_search_key     | A         |    64767369 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            1 | term_language       | A         |     6378604 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            2 | term_type           | A         |     8547977 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            3 | term_entity_type    | A         |     6330645 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            4 | term_search_key     | A         |   420987903 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity    |            1 | term_full_entity_id | A         |   112263440 |     NULL | NULL   | YES  | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            1 | term_language       | A         |     7075426 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            2 | term_full_entity_id | A         |  1683951612 |     NULL | NULL   | YES  | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            3 | term_type           | A         |  1683951612 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            4 | term_search_key     | A         |  1683951612 |       16 | NULL   |      | BTREE      |         |               |
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
13 rows in set (0.00 sec)

root@db1104[wikidatawiki]> explain SELECT /* Wikibase\Lib\Store\Sql\TermSqlIndex::getMatchingTerms ... */  term_entity_type,term_type,term_language,term_text,term_weight,term_full_entity_id  FROM `wb_terms`    WHERE ((term_language = 'bn' AND term_search_key = '' AND term_type = 'label' AND term_entity_type = 'item') OR (term_language = 'bn' AND term_search_key = '' AND term_type = 'alias' AND term_entity_type = 'item'))  LIMIT 2500;
+------+-------------+----------+------+-------------------------------------------+---------------------+---------+-------+------+------------------------------------+
| id   | select_type | table    | type | possible_keys                             | key                 | key_len | ref   | rows | Extra                              |
+------+-------------+----------+------+-------------------------------------------+---------------------+---------+-------+------+------------------------------------+
|    1 | SIMPLE      | wb_terms | ref  | wb_terms_search_key,tmp1,term_search_full | wb_terms_search_key | 257     | const |    1 | Using index condition; Using where |
+------+-------------+----------+------+-------------------------------------------+---------------------+---------+-------+------+------------------------------------+
1 row in set (0.00 sec)

Change 434631 abandoned by Marostegui:
db-eqiad.php: Depool db1104

https://gerrit.wikimedia.org/r/434631

Change 434632 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/434632

Change 434632 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/434632

That query takes 0.00 on db1104 (which has the indexes) and it is taking more than 5 minutes (still running) on db1092 (without the indexes) so I am going to read those indexes to db1092 till this is investigated.

Mentioned in SAL (#wikimedia-operations) [2018-05-23T06:49:25Z] <marostegui> Re-add indexes on wb_terms on db1092 - T194273

Change 434658 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/mediawiki-config@master] Make clients read from term_text instead of term_search_key

https://gerrit.wikimedia.org/r/434658

Change 434658 merged by jenkins-bot:
[operations/mediawiki-config@master] Make clients read from term_text instead of term_search_key

https://gerrit.wikimedia.org/r/434658

Mentioned in SAL (#wikimedia-operations) [2018-05-23T17:14:27Z] <twentyafterfour@tin> Synchronized wmf-config/Wikibase-production.php: SWAT deploying https://gerrit.wikimedia.org/r/#/c/434658/ refs T194273 (duration: 01m 20s)

Okay, with the patch deployed, I think we are good to go here, @Marostegui : let me know if anything happens

Great, I will repool db1092 tomorrow and check how it goes

Change 434860 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Repool db1092

https://gerrit.wikimedia.org/r/434860

Change 434860 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Repool db1092

https://gerrit.wikimedia.org/r/434860

I have repooled db1092 and it looks better, but we still have quite some different things and one of the indexes we are removing it is actually being used (term_search_full ):

Look at this:
db1092

root@db1092[wikidatawiki]> show indexes from wb_terms;
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY          |            1 | term_row_id         | A         |  1671005435 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text    |            1 | term_text           | A         |   278500905 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity |            1 | term_full_entity_id | A         |   119357531 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.00 sec)

root@db1092[wikidatawiki]> explain SELECT /* Wikibase\Lib\Store\Sql\TermSqlIndex::fetchTerms  */  term_type,term_language,term_text,term_full_entity_id  FROM `wb_terms`    WHERE term_entity_type = 'item' AND term_full_entity_id IN (xx)  AND term_language = 'en' AND term_type = 'description';
+------+-------------+----------+-------+------------------+------------------+---------+------+------+------------------------------------+
| id   | select_type | table    | type  | possible_keys    | key              | key_len | ref  | rows | Extra                              |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | wb_terms | range | term_full_entity | term_full_entity | 35      | NULL |  799 | Using index condition; Using where |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+------------------------------------+
1 row in set (0.00 sec)

https://grafana.wikimedia.org/dashboard/db/mysql?panelId=3&fullscreen&orgId=1&var-dc=eqiad%20prometheus%2Fops&var-server=db1092&var-port=9104&from=1527130879532&to=1527141679533&refresh=30s

db1104:

root@db1104[wikidatawiki]> show indexes from wb_terms;
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name            | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY             |            1 | term_row_id         | A         |  1702346260 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_entity_id  |            1 | term_entity_id      | A         |   100138015 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text       |            1 | term_text           | A         |   243192322 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_search_key |            1 | term_search_key     | A         |    65474856 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            1 | term_language       | A         |     6375828 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            2 | term_type           | A         |     8554503 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            3 | term_entity_type    | A         |     6328424 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            4 | term_search_key     | A         |   425586565 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity    |            1 | term_full_entity_id | A         |   113489750 |     NULL | NULL   | YES  | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            1 | term_language       | A         |     7063677 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            2 | term_full_entity_id | A         |  1702346260 |     NULL | NULL   | YES  | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            3 | term_type           | A         |  1702346260 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            4 | term_search_key     | A         |  1702346260 |       16 | NULL   |      | BTREE      |         |               |
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
13 rows in set (0.00 sec)

root@db1104[wikidatawiki]> explain SELECT /* Wikibase\Lib\Store\Sql\TermSqlIndex::fetchTerms  */  term_type,term_language,term_text,term_full_entity_id  FROM `wb_terms`    WHERE term_entity_type = 'item' AND term_full_entity_id IN (xx)  AND term_language = 'en' AND term_type = 'description';
+------+-------------+----------+-------+----------------------------------------+------------------+---------+------+------+------------------------------------+
| id   | select_type | table    | type  | possible_keys                          | key              | key_len | ref  | rows | Extra                              |
+------+-------------+----------+-------+----------------------------------------+------------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | wb_terms | range | tmp1,term_full_entity,term_search_full | term_search_full | 103     | NULL |   15 | Using index condition; Using where |
+------+-------------+----------+-------+----------------------------------------+------------------+---------+------+------+------------------------------------+
1 row in set (0.00 sec)

https://grafana.wikimedia.org/dashboard/db/mysql?panelId=3&fullscreen&orgId=1&from=1527130904091&to=1527141704092&refresh=30s&var-dc=eqiad%20prometheus%2Fops&var-server=db1104&var-port=9104

FYI: https://dev.mysql.com/doc/refman/5.5/en/server-status-variables.html#statvar_Handler_read_next

This is forcing and index and not doing it:

root@db1104[wikidatawiki]> explain SELECT  /* Wikibase\Lib\Store\Sql\TermSqlIndex::fetchTerms  */  term_type,term_language,term_text,term_full_entity_id  FROM `wb_terms` USE INDEX (term_full_entity)    WHERE term_entity_type = 'item' AND term_full_entity_id IN (xx)  AND term_language = 'en' AND term_type = 'description';
+------+-------------+----------+-------+------------------+------------------+---------+------+------+------------------------------------+
| id   | select_type | table    | type  | possible_keys    | key              | key_len | ref  | rows | Extra                              |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | wb_terms | range | term_full_entity | term_full_entity | 35      | NULL |  803 | Using index condition; Using where |
+------+-------------+----------+-------+------------------+------------------+---------+------+------+------------------------------------+
1 row in set (0.00 sec)

root@db1104[wikidatawiki]> explain SELECT  /* Wikibase\Lib\Store\Sql\TermSqlIndex::fetchTerms  */  term_type,term_language,term_text,term_full_entity_id  FROM `wb_terms`    WHERE term_entity_type = 'item' AND term_full_entity_id IN (xx)  AND term_language = 'en' AND term_type = 'description';
+------+-------------+----------+-------+----------------------------------------+------------------+---------+------+------+------------------------------------+
| id   | select_type | table    | type  | possible_keys                          | key              | key_len | ref  | rows | Extra                              |
+------+-------------+----------+-------+----------------------------------------+------------------+---------+------+------+------------------------------------+
|    1 | SIMPLE      | wb_terms | range | tmp1,term_full_entity,term_search_full | term_search_full | 103     | NULL |   15 | Using index condition; Using where |
+------+-------------+----------+-------+----------------------------------------+------------------+---------+------+------+------------------------------------+
1 row in set (0.00 sec)
jcrespo added a comment.EditedMay 24 2018, 12:44 PM

TermSqlIndex::getMatchingTerms is still failing, it is right now the top failing query among all mediawiki databases:

Wikibase\Lib\Store\Sql\TermSqlIndex::getMatchingTerms	10.64.16.85	2062	Read timeout is reached (10.64.16.85)	SELECT  term_entity_type,term_type,term_language,term_text,term_weight,term_full_entity_id  FROM `wb_terms`    WHERE ((term_language = 'en' AND (term_text LIKE 'b%' ESCAPE '`' ) AND term_type IN ('label','alias')  AND term_entity_type = 'property'))  LIMIT 2500

Idenpendently of the indexes added or removed, term_text LIKE 'b%' seems to me like a really bad idea- an index will likely be ignored for that filter.

jcrespo raised the priority of this task from Normal to High.May 24 2018, 12:45 PM

Wait a second, I thought it's depooled again. if that's not right, I get to it right now.

TermSqlIndex::getMatchingTerms is still failing, it is right now the top failing query among all mediawiki databases:

Wikibase\Lib\Store\Sql\TermSqlIndex::getMatchingTerms	10.64.16.85	2062	Read timeout is reached (10.64.16.85)	SELECT  term_entity_type,term_type,term_language,term_text,term_weight,term_full_entity_id  FROM `wb_terms`    WHERE ((term_language = 'en' AND (term_text LIKE 'b%' ESCAPE '`' ) AND term_type IN ('label','alias')  AND term_entity_type = 'property'))  LIMIT 2500

Idenpendently of the indexes added or removed, term_text LIKE 'b%' seems to me like a really bad idea- an index will likely be ignored for that filter.

We missed that PropertySuggester still uses wb_terms to search for properties. See T195490: Use CirrusSearch for PropertySuggester.

Mentioned in SAL (#wikimedia-operations) [2018-05-25T05:05:13Z] <marostegui> Add tmp1 index back to db1109 - T194273

Marostegui added a comment.EditedMay 25 2018, 5:06 AM

Progress of the re-additon of the tmp1 index:

  • codfw
  • dbstore1002
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • db1095 sanitarium
  • db1109 main
  • db1104 main, api
  • db1116 temporary sanitarium
  • db1124 temporary sanitarium
  • db1101 rc
  • db1099 rc
  • db1092 api
  • db1087 vslow (master for sanitariums)
  • db1071 (master) (will not be done - it is not needed there)

We should get the index on db1109, db1104, db1101, db1099, db1092 and db1087 before enabling anything back.
Problably it is not needed on the master, anyways.

Change 435113 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] mediawiki: Stop replacing term_search_key with empty string

https://gerrit.wikimedia.org/r/435113

Change 435113 merged by Marostegui:
[operations/puppet@production] mediawiki: Stop replacing term_search_key with empty string

https://gerrit.wikimedia.org/r/435113

Mentioned in SAL (#wikimedia-operations) [2018-05-25T05:57:24Z] <marostegui> Add tmp1 index back on dbstore1002 - T194273

Change 435128 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/435128

Change 435128 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1092

https://gerrit.wikimedia.org/r/435128

Mentioned in SAL (#wikimedia-operations) [2018-05-25T13:15:57Z] <marostegui> Add indexes back on s8 codfw primary master (db2045) this will generate lag on codfw - T194273

Change 435492 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1099:3318

https://gerrit.wikimedia.org/r/435492

Change 435492 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1099:3318

https://gerrit.wikimedia.org/r/435492

Mentioned in SAL (#wikimedia-operations) [2018-05-26T05:21:54Z] <marostegui> Add tmp1 index back on db1099:3318 - T194273

Change 435624 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1087

https://gerrit.wikimedia.org/r/435624

Change 435624 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1087

https://gerrit.wikimedia.org/r/435624

Mentioned in SAL (#wikimedia-operations) [2018-05-26T09:56:15Z] <marostegui> Add tmp1 index back on db1087 (sanitarium master), this will generate lag on labsdb hosts - T194273

Change 435636 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1101:3318

https://gerrit.wikimedia.org/r/435636

Change 435636 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1101:3318

https://gerrit.wikimedia.org/r/435636

Mentioned in SAL (#wikimedia-operations) [2018-05-26T14:25:37Z] <marostegui> Add tmp1 index back on db1101:3318 - T194273

Marostegui moved this task from In progress to Blocked external/Not db team on the DBA board.

All indexes have been added back to the slaves: T194273#4230677
Re-assigning this back @Ladsgroup to reflect that this is now blocked on Wikidata's Team to see how we proceed next.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMay 26 2018, 7:00 PM

Still ongoing:

Wikibase\Lib\Store\Sql\TermSqlIndex::getMatchingTerms	10.64.32.113	2062	Read timeout is reached (10.64.32.113)	SELECT  term_entity_type,term_type,term_language,term_text,term_weight,term_full_entity_id  FROM `wb_terms`    WHERE ((term_language = 'en' AND (term_text LIKE 'ص%' ESCAPE '`' ) AND term_type = 'label' AND term_entity_type = 'item') OR (term_language = 'en' AND (term_text LIKE 'ص%' ESCAPE '`' ) AND term_type = 'alias' AND term_entity_type = 'item'))  LIMIT 2500

Still ongoing:

Wikibase\Lib\Store\Sql\TermSqlIndex::getMatchingTerms	10.64.32.113	2062	Read timeout is reached (10.64.32.113)	SELECT  term_entity_type,term_type,term_language,term_text,term_weight,term_full_entity_id  FROM `wb_terms`    WHERE ((term_language = 'en' AND (term_text LIKE 'ص%' ESCAPE '`' ) AND term_type = 'label' AND term_entity_type = 'item') OR (term_language = 'en' AND (term_text LIKE 'ص%' ESCAPE '`' ) AND term_type = 'alias' AND term_entity_type = 'item'))  LIMIT 2500

We stopped enabling one client feature which would increase in number of such queries by factor of five. Is it too bad? how urgent this needs to be fixed? cc. @Lydia_Pintscher

I thought that was the query that broke things, am I wrong?

Any ideas about a related ticket? T197854

Vvjjkkii renamed this task from Clean up indexes of wb_terms table to uadaaaaaaa.Jul 1 2018, 1:11 AM
Vvjjkkii removed Ladsgroup as the assignee of this task.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
Marostegui renamed this task from uadaaaaaaa to Clean up indexes of wb_terms table.Jul 1 2018, 6:35 PM
Marostegui assigned this task to Ladsgroup.
Marostegui lowered the priority of this task from High to Normal.
Marostegui updated the task description. (Show Details)
Addshore changed the task status from Open to Stalled.Aug 20 2018, 9:37 AM
Addshore added a subscriber: Addshore.

Marking as stalled as this is on Hold

Is there any point on having this task if at some point wb_terms is going to be killed?
Is it worth spending the time to research+alter this huge table if it will eventually be killed?

We cannot alter wb_terms anymore.
The table is almost 1TB on the master, and the master "only" has 1.1TB free on the master which makes it very high risky and almost impossible to run.

We cannot alter wb_terms anymore.
The table is almost 1TB on the master, and the master "only" has 1.1TB free on the master which makes it very high risky and almost impossible to run.

With normalizing of change_tag and dropping of tag_summary it would free up 50-70 GB of space which gives a little bit flexibility but overall I don't see any point of alters since the plan is to just kill the thing.

Marostegui closed this task as Declined.Dec 4 2018, 6:16 AM

Let's decline this then (feel free to reopen if you think it should be kept open)

Change 433747 abandoned by Thiemo Kreuz (WMDE):
Drop search-related indexes when not needed

Reason:
I'm told it was decided to never alter the wb_terms table again, but replace and drop it some day. The linked task got declined. So I assume it's safe to abandon this proposal for now. Please restore it if I got that wrong.

https://gerrit.wikimedia.org/r/433747