Page MenuHomePhabricator

Drop the "wb_terms.wb_terms_language" index
Closed, ResolvedPublic

Description

The wb_terms_language index on wb_terms is almost certainly not very useful (just ~400 distinct values over 600M+ rows), thus it can probably be dropped for good.

From db1087:

KEY `wb_terms_language` (`term_language`),
  • dbstore2001.codfw.wmnet 3315
  • db2089.codfw.wmnet 3315
  • db2086.codfw.wmnet 3315
  • db2085.codfw.wmnet 3315
  • db2084.codfw.wmnet 3315
  • db2083.codfw.wmnet 3306
  • db2082.codfw.wmnet 3306
  • db2081.codfw.wmnet 3306
  • db2080.codfw.wmnet 3306
  • db2079.codfw.wmnet 3306
  • db2075.codfw.wmnet 3306
  • db2038.codfw.wmnet 3306
  • db2045.codfw.wmnet 3306
  • db2052.codfw.wmnet 3306
  • db2059.codfw.wmnet 3306
  • db2066.codfw.wmnet 3306
  • db2023.codfw.wmnet 3306
  • labsdb1001.eqiad.wmnet 3306 (not possible, read-only host: T179464)
  • labsdb1003.eqiad.wmnet 3306
  • labsdb1009.eqiad.wmnet 3306
  • labsdb1010.eqiad.wmnet 3306
  • labsdb1011.eqiad.wmnet 3306
  • db1095.eqiad.wmnet 3306
  • dbstore1001.eqiad.wmnet 3306
  • dbstore1002.eqiad.wmnet 3306
  • db1070.eqiad.wmnet 3306
  • db1071.eqiad.wmnet 3306
  • db1082.eqiad.wmnet 3306
  • db1109.eqiad.wmnet
  • db1110.eqiad.wmnet
  • db1087.eqiad.wmnet 3306
  • db1092.eqiad.wmnet 3306
  • db1096.eqiad.wmnet 3306
  • db1099.eqiad.wmnet 3306
  • db1100.eqiad.wmnet 3306
  • db1104.eqiad.wmnet 3306
  • db1106.eqiad.wmnet 3306
  • db1063.eqiad.wmnet 3306

Event Timeline

hoo renamed this task from Drop the "wb_terms.term_language" index to Drop the "wb_terms.wb_terms_language" index.Oct 26 2017, 6:28 PM
hoo updated the task description. (Show Details)
hoo added subscribers: Marostegui, daniel, Ladsgroup.
Marostegui triaged this task as Medium priority.Oct 27 2017, 5:58 AM

If this requires to depool servers (I don't know yet), I was talking to @Ladsgroup that it might be even worth to run an optimize over wb_terms table to reclaim some space.

Marostegui moved this task from Pending comment to In progress on the DBA board.

I will definitely try to optimize the table, as it is getting out of hand: -rw-rw---- 1 mysql mysql 716G Nov 10 09:08 /srv/sqldata/wikidatawiki/wb_terms.ibd

Mentioned in SAL (#wikimedia-operations) [2017-11-10T10:24:24Z] <marostegui> Deploy schema change on db2089 - T179106

Going to test+optimize on db2089 which has wb_terms compressed.

-rw-rw---- 1 mysql mysql 529G Nov 10 10:25 /srv/sqldata.s5/wikidatawiki/wb_terms.ibd

I will definitely try to optimize the table, as it is getting out of hand

Now that we use Elastic for prefix search, we could drop all rows with term_type = "alias" from this table. Needs a little bit of code to make this optional, but should save a lot of space. If you think that would help, ask @hoo and @Lydia_Pintscher about it.

I will definitely try to optimize the table, as it is getting out of hand

Now that we use Elastic for prefix search, we could drop all rows with term_type = "alias" from this table. Needs a little bit of code to make this optional, but should save a lot of space. If you think that would help, ask @hoo and @Lydia_Pintscher about it.

The % of rows with that type isn't huge

root@db2086.codfw.wmnet[wikidatawiki]> select count(*) from wb_terms where term_type="alias";
+----------+
| count(*) |
+----------+
| 16570605 |
+----------+
1 row in set (11 min 24.28 sec)


root@db2086.codfw.wmnet[wikidatawiki]> select count(*) from wb_terms;
+------------+
| count(*)   |
+------------+
| 1268234185 |
+------------+
1 row in set (6 min 4.49 sec)

So that makes it around 1.3%, not huge, but it would still be good to get them cleaned if they are not necessary.

@hoo @Lydia_Pintscher you want me to create a ticket about it or you'd do it?

Interesting. I would have expected it to be a lot more than that.

Mentioned in SAL (#wikimedia-operations) [2017-11-13T06:18:21Z] <marostegui> Deploy alter table db2086 - T179106

Going to test+optimize on db2089 which has wb_terms compressed.

-rw-rw---- 1 mysql mysql 529G Nov 10 10:25 /srv/sqldata.s5/wikidatawiki/wb_terms.ibd

Not a big win after the optimization on a compressed table:

-rw-rw---- 1 mysql mysql 496G Nov 13 06:37 /srv/sqldata.s5/wikidatawiki/wb_terms.ibd

The index has been dropped from all codfw.
I am optimizing a non-compressed slave to see if the gain is worth the time. As we have seen that probably for a compressed one is not.

root@neodymium:/home/marostegui/git/software/dbtools# cat s5.hosts | grep codfw | while read host port; do echo $host:$port; mysql --skip-ssl wikidatawiki -h$host -P$port -e "show create table wb_terms\G" | grep wb_terms_language; done
dbstore2001.codfw.wmnet:3315
db2089.codfw.wmnet:3315
db2086.codfw.wmnet:3315
db2085.codfw.wmnet:3315
db2084.codfw.wmnet:3315
db2083.codfw.wmnet:3306
db2082.codfw.wmnet:3306
db2081.codfw.wmnet:3306
db2080.codfw.wmnet:3306
db2079.codfw.wmnet:3306
db2075.codfw.wmnet:3306
db2038.codfw.wmnet:3306
db2045.codfw.wmnet:3306
db2052.codfw.wmnet:3306
db2059.codfw.wmnet:3306
db2066.codfw.wmnet:3306
db2023.codfw.wmnet:3306

This is the non-compressed status:

root@db2052:~# ls -lh /srv/sqldata/wikidatawiki/wb_terms.ibd
-rw-rw---- 1 mysql mysql 718G Nov 13 07:32 /srv/sqldata/wikidatawiki/wb_terms.ibd

Mentioned in SAL (#wikimedia-operations) [2017-11-13T07:33:35Z] <marostegui> Optimize wb_terms table on db2052 - T179106

Mentioned in SAL (#wikimedia-operations) [2017-11-13T07:38:53Z] <marostegui> Deploy alter table to db1104 - T179106

This is the non-compressed status:

root@db2052:~# ls -lh /srv/sqldata/wikidatawiki/wb_terms.ibd
-rw-rw---- 1 mysql mysql 718G Nov 13 07:32 /srv/sqldata/wikidatawiki/wb_terms.ibd

This one was worth the optimize

-rw-rw---- 1 mysql mysql 489G Nov 14 06:34 wb_terms.ibd

Change 392582 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/mediawiki-config@master] db-eqiad.php: Depool db1087

https://gerrit.wikimedia.org/r/392582

Change 392582 merged by jenkins-bot:
[operations/mediawiki-config@master] db-eqiad.php: Depool db1087

https://gerrit.wikimedia.org/r/392582

Mentioned in SAL (#wikimedia-operations) [2017-11-21T06:47:15Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Depool db1087 - T179106 (duration: 00m 48s)

Mentioned in SAL (#wikimedia-operations) [2017-11-21T06:50:07Z] <marostegui@tin> Synchronized wmf-config/db-eqiad.php: Repool db1087 - T179106 (duration: 00m 48s)

This is all done now