See T188279#4101129 and onwards
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Resolved | Addshore | T188992 Make wb_terms table fancy | |||
| Resolved | Ladsgroup | T191391 Apply schema changes to an isolated database and examine the results | |||
| Resolved | jcrespo | T196047 reimage db2083 back into wikidata (s8) |
Event Timeline
I found T86530, which may be outdated, but may help with giving more options.
As a note to @Ladsgroup, and the other people working on this- after dropping a column, replication stopped working on the test host (as expected). This is not a problem, just a remark so you know why no more updates are arriving to that host (and of course, that in production, the change should be done in a different compatible way first).
It would be nice to report size changes here, so we are all aware.
Yeah, Some of my changes are done now. what do you think of this data?
test_user@db2083.codfw.wmnet(wikidatawiki)>SELECT * from information_schema.TABLES where TABLE_NAME = 'wb_terms'\G;
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: wikidatawiki
TABLE_NAME: wb_terms
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Compressed
TABLE_ROWS: 1585960379
AVG_ROW_LENGTH: 57
DATA_LENGTH: 91478294528
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 227068411904
DATA_FREE: 47484239872
AUTO_INCREMENT: 2130542996
CREATE_TIME: 2018-04-10 16:07:05
UPDATE_TIME: NULL
CHECK_TIME: NULL
TABLE_COLLATION: binary
CHECKSUM: NULL
CREATE_OPTIONS: row_format=COMPRESSED key_block_size=8
TABLE_COMMENT:
1 row in set (0.03 sec)The indexes seem disproportionally large compared to the data. Could the table be split somehow in a 1:1 relationship that could make sense (e.g. index_table with just columns to search and data_table with almost just the primary index).
That is very valid. I looked into indexes:
test_user@db2083.codfw.wmnet(wikidatawiki)>show index from wb_terms; +----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | wb_terms | 0 | PRIMARY | 1 | term_row_id | A | 1585960379 | NULL | NULL | | BTREE | | | | wb_terms | 1 | wb_terms_text | 1 | term_text | A | 528653459 | NULL | NULL | | BTREE | | | | wb_terms | 1 | wb_terms_search_key | 1 | term_search_key | A | 28835643 | NULL | NULL | | BTREE | | | | wb_terms | 1 | tmp1 | 1 | term_language | A | 4217979 | NULL | NULL | | BTREE | | | | wb_terms | 1 | tmp1 | 2 | term_type | A | 2969963 | NULL | NULL | | BTREE | | | | wb_terms | 1 | tmp1 | 3 | term_entity_type | A | 6446993 | NULL | NULL | | BTREE | | | | wb_terms | 1 | tmp1 | 4 | term_search_key | A | 577132 | NULL | NULL | | BTREE | | | | wb_terms | 1 | term_full_entity | 1 | term_full_entity_id | A | 144178216 | NULL | NULL | YES | BTREE | | | | wb_terms | 1 | term_search_full | 1 | term_language | A | 7552192 | NULL | NULL | | BTREE | | | | wb_terms | 1 | term_search_full | 2 | term_full_entity_id | A | 1585960379 | NULL | NULL | YES | BTREE | | | | wb_terms | 1 | term_search_full | 3 | term_type | A | 1585960379 | NULL | NULL | | BTREE | | | | wb_terms | 1 | term_search_full | 4 | term_search_key | A | 1585960379 | 16 | NULL | | BTREE | | | +----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 12 rows in set (0.04 sec)
1- The indexes are pretty different from what we have in the code
2- I dropped all of not-needed indexes:
test_user@db2083.codfw.wmnet(wikidatawiki)>show index from wb_terms; +----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | wb_terms | 0 | PRIMARY | 1 | term_row_id | A | 1585960379 | NULL | NULL | | BTREE | | | | wb_terms | 1 | wb_terms_text | 1 | term_text | A | 528653459 | NULL | NULL | | BTREE | | | | wb_terms | 1 | term_full_entity | 1 | term_full_entity_id | A | 144178216 | NULL | NULL | YES | BTREE | | | +----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ 3 rows in set (0.03 sec)
I'm running an optimize table on it to reclaim the space and see how much it opens up...
After running the optimize table we have this now:
test_user@db2083.codfw.wmnet(wikidatawiki)>SELECT * from information_schema.TABLES where TABLE_NAME = 'wb_terms'\G;
*************************** 1. row ***************************
TABLE_CATALOG: def
TABLE_SCHEMA: wikidatawiki
TABLE_NAME: wb_terms
TABLE_TYPE: BASE TABLE
ENGINE: InnoDB
VERSION: 10
ROW_FORMAT: Compressed
TABLE_ROWS: 1600682009
AVG_ROW_LENGTH: 45
DATA_LENGTH: 72589770752
MAX_DATA_LENGTH: 0
INDEX_LENGTH: 52602863616
DATA_FREE: 0
AUTO_INCREMENT: 2130542996
CREATE_TIME: 2018-05-09 07:18:14
UPDATE_TIME: NULL
CHECK_TIME: NULL
TABLE_COLLATION: binary
CHECKSUM: NULL
CREATE_OPTIONS: row_format=COMPRESSED key_block_size=8
TABLE_COMMENT:
1 row in set (0.04 sec)That sounds okay to me
Don't repool it unless it's back to the old status.
So, to clarify, we can remove all data here and eliminate the extra accounts?