Page MenuHomePhabricator

Apply schema changes to an isolated database and examine the results
Closed, ResolvedPublic

Description

See T188279#4101129 and onwards

Event Timeline

Ladsgroup created this task.Apr 4 2018, 9:40 AM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptApr 4 2018, 9:40 AM

I found T86530, which may be outdated, but may help with giving more options.

As a note to @Ladsgroup, and the other people working on this- after dropping a column, replication stopped working on the test host (as expected). This is not a problem, just a remark so you know why no more updates are arriving to that host (and of course, that in production, the change should be done in a different compatible way first).

It would be nice to report size changes here, so we are all aware.

Yeah, Some of my changes are done now. what do you think of this data?

test_user@db2083.codfw.wmnet(wikidatawiki)>SELECT * from information_schema.TABLES where TABLE_NAME = 'wb_terms'\G;
*************************** 1. row ***************************
  TABLE_CATALOG: def
   TABLE_SCHEMA: wikidatawiki
     TABLE_NAME: wb_terms
     TABLE_TYPE: BASE TABLE
         ENGINE: InnoDB
        VERSION: 10
     ROW_FORMAT: Compressed
     TABLE_ROWS: 1585960379
 AVG_ROW_LENGTH: 57
    DATA_LENGTH: 91478294528
MAX_DATA_LENGTH: 0
   INDEX_LENGTH: 227068411904
      DATA_FREE: 47484239872
 AUTO_INCREMENT: 2130542996
    CREATE_TIME: 2018-04-10 16:07:05
    UPDATE_TIME: NULL
     CHECK_TIME: NULL
TABLE_COLLATION: binary
       CHECKSUM: NULL
 CREATE_OPTIONS: row_format=COMPRESSED key_block_size=8
  TABLE_COMMENT: 
1 row in set (0.03 sec)

The indexes seem disproportionally large compared to the data. Could the table be split somehow in a 1:1 relationship that could make sense (e.g. index_table with just columns to search and data_table with almost just the primary index).

That is very valid. I looked into indexes:

test_user@db2083.codfw.wmnet(wikidatawiki)>show index from wb_terms;
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name            | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY             |            1 | term_row_id         | A         |  1585960379 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text       |            1 | term_text           | A         |   528653459 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_search_key |            1 | term_search_key     | A         |    28835643 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            1 | term_language       | A         |     4217979 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            2 | term_type           | A         |     2969963 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            3 | term_entity_type    | A         |     6446993 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | tmp1                |            4 | term_search_key     | A         |      577132 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity    |            1 | term_full_entity_id | A         |   144178216 |     NULL | NULL   | YES  | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            1 | term_language       | A         |     7552192 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            2 | term_full_entity_id | A         |  1585960379 |     NULL | NULL   | YES  | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            3 | term_type           | A         |  1585960379 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_search_full    |            4 | term_search_key     | A         |  1585960379 |       16 | NULL   |      | BTREE      |         |               |
+----------+------------+---------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
12 rows in set (0.04 sec)

1- The indexes are pretty different from what we have in the code
2- I dropped all of not-needed indexes:

test_user@db2083.codfw.wmnet(wikidatawiki)>show index from wb_terms;
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| wb_terms |          0 | PRIMARY          |            1 | term_row_id         | A         |  1585960379 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | wb_terms_text    |            1 | term_text           | A         |   528653459 |     NULL | NULL   |      | BTREE      |         |               |
| wb_terms |          1 | term_full_entity |            1 | term_full_entity_id | A         |   144178216 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
3 rows in set (0.03 sec)

I'm running an optimize table on it to reclaim the space and see how much it opens up...

After running the optimize table we have this now:

test_user@db2083.codfw.wmnet(wikidatawiki)>SELECT * from information_schema.TABLES where TABLE_NAME = 'wb_terms'\G;
*************************** 1. row ***************************
  TABLE_CATALOG: def
   TABLE_SCHEMA: wikidatawiki
     TABLE_NAME: wb_terms
     TABLE_TYPE: BASE TABLE
         ENGINE: InnoDB
        VERSION: 10
     ROW_FORMAT: Compressed
     TABLE_ROWS: 1600682009
 AVG_ROW_LENGTH: 45
    DATA_LENGTH: 72589770752
MAX_DATA_LENGTH: 0
   INDEX_LENGTH: 52602863616
      DATA_FREE: 0
 AUTO_INCREMENT: 2130542996
    CREATE_TIME: 2018-05-09 07:18:14
    UPDATE_TIME: NULL
     CHECK_TIME: NULL
TABLE_COLLATION: binary
       CHECKSUM: NULL
 CREATE_OPTIONS: row_format=COMPRESSED key_block_size=8
  TABLE_COMMENT:
1 row in set (0.04 sec)

That sounds okay to me

Yes, that is much better.

I think this part is done, I make some actionables.

Ladsgroup closed this task as Resolved.May 31 2018, 8:32 AM

This is done. Don't repool it unless it's back to the old status.

Don't repool it unless it's back to the old status.

So, to clarify, we can remove all data here and eliminate the extra accounts?

Yes, it is marked as not repool on the config

So, to clarify, we can remove all data here and eliminate the extra accounts?

Yup.