Page MenuHomePhabricator

Compress new Wikibase tables
Open, MediumPublic

Description

This and maybe other newly created tables are in innodb uncompressed format, and probably will benefit a lot from it:

| wbt_item_terms           | InnoDB |      10 | Compact    |   79850814 |             38 |   3068133376 |               
| wbt_property_terms       | InnoDB |      10 | Compact    |     261980 |             34 |      8929280 |               
| wbt_term_in_lang         | InnoDB |      10 | Compact    |   45801912 |             35 |   1633681408 |               
| wbt_text                 | InnoDB |      10 | Compact    |   17176409 |             60 |   1044381696 |               
| wbt_text_in_lang         | InnoDB |      10 | Compact    |   45872021 |             36 |   1680867328 |               
| wbt_type                 | InnoDB |      10 | Compact    |          2 |           8192 |        16384 |

Progress:

  • db1109
  • db1087
  • db1092
  • db1099:3318
  • db1101:3318
  • db1104
  • db1116:3318
  • db1124:3318
  • db1126
  • db2079
  • db2080
  • db2081
  • db2082
  • db2083
  • db2085:3318
  • db2086:3318
  • db2094:3318
  • db2100:3318
  • dbstore1005:3318
  • labsdb1009
  • labsdb1010
  • labsdb1011
  • labsdb1012

Also the new wikibase tables created in s4 need compression:

  • labsdb1012
  • labsdb1011
  • labsdb1010
  • labsdb1009
  • dbstore1004:3314
  • db2119
  • db2110
  • db2106
  • db2099
  • db2095
  • db2091
  • db2090
  • db2084
  • db2073
  • db1138
  • db1125
  • db1121
  • db1103
  • db1102
  • db1097
  • db1091
  • db1084
  • db1081

Event Timeline

jcrespo created this task.Sep 10 2019, 9:18 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 10 2019, 9:18 AM
jcrespo claimed this task.Sep 10 2019, 9:34 AM
jcrespo moved this task from Triage to In progress on the DBA board.

I will do a test run on db2100:3318, to see how much time it may take currently.

jcrespo triaged this task as Medium priority.Sep 10 2019, 9:34 AM
jcrespo updated the task description. (Show Details)Sep 10 2019, 10:39 AM

It took one hour, but the compression was worth it:

-rw-rw---- 1 mysql mysql 8.5G Sep 10 09:38 wbt_item_terms.ibd
-rw-rw---- 1 mysql mysql  36M Sep 10 08:13 wbt_property_terms.ibd
-rw-rw---- 1 mysql mysql 4.2G Sep 10 09:37 wbt_term_in_lang.ibd
-rw-rw---- 1 mysql mysql 2.1G Sep 10 09:38 wbt_text.ibd
-rw-rw---- 1 mysql mysql 4.2G Sep 10 09:37 wbt_text_in_lang.ibd
-rw-rw---- 1 mysql mysql 112K Jun 26 13:25 wbt_type.ibd

-rw-rw---- 1 mysql mysql 3.4G Sep 10 10:34 wbt_item_terms.ibd
-rw-rw---- 1 mysql mysql  15M Sep 10 10:20 wbt_property_terms.ibd
-rw-rw---- 1 mysql mysql 1.7G Sep 10 10:32 wbt_term_in_lang.ibd
-rw-rw---- 1 mysql mysql 952M Sep 10 10:34 wbt_text.ibd
-rw-rw---- 1 mysql mysql 1.8G Sep 10 10:39 wbt_text_in_lang.ibd
-rw-rw---- 1 mysql mysql  64K Sep 10 10:30 wbt_type.ibd

Compressing db1116:s8, however that may take a while as it has other uncompressed tables that are also being converted.

jcrespo updated the task description. (Show Details)Sep 12 2019, 8:20 AM

Mentioned in SAL (#wikimedia-operations) [2019-09-12T09:42:15Z] <jynus> compressing tables on labsdb1012 T232446

s8 backups on eqiad had shrinked considerably:

mysql> select name, total_size FROM backups WHERE type = 'snapshot' and section = 's8' and status = 'finished' and source like '%eqiad%' ORDER BY id desc;
+----------------------------------+---------------+
| name                             | total_size    |
+----------------------------------+---------------+
| snapshot.s8.2019-09-12--08-29-04 | 1199474663181 |
| snapshot.s8.2019-09-08--19-00-01 | 1650038824688 |
| snapshot.s8.2019-09-05--19-00-01 | 1639380508400 |
| snapshot.s8.2019-09-03--19-00-01 | 1627220696816 |
| snapshot.s8.2019-09-01--19-00-02 | 1625729689716 |
jcrespo updated the task description. (Show Details)Sep 12 2019, 2:12 PM

Mentioned in SAL (#wikimedia-operations) [2019-10-11T06:08:15Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2085:3318 for compression - T232446', diff saved to https://phabricator.wikimedia.org/P9311 and previous config saved to /var/cache/conftool/dbconfig/20191011-060814-marostegui.json

I am going to start compressing rc slaves, as there are many tables there that need compression, due to all the recent schema changes., I have seen that more special slaves across different sections need re-compression. Given that we have this ticket already, I will start with hosts I am currently working on
Example db2085:3318

root@db2085:~# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   3.6T  2.9T  777G  79% /srv
+--------------------------------------+
| table_name                           |
+--------------------------------------+
| wikidatawiki.actor                   |
| wikidatawiki.archive                 |
| wikidatawiki.change_tag              |
| wikidatawiki.change_tag_def          |
| wikidatawiki.comment                 |
| wikidatawiki.content                 |
| wikidatawiki.content_models          |
| wikidatawiki.entityschema_id_counter |
| wikidatawiki.ipblocks_restrictions   |
| wikidatawiki.revision_actor_temp     |
| wikidatawiki.revision_comment_temp   |
| wikidatawiki.slot_roles              |
| wikidatawiki.slots                   |
| wikidatawiki.translate_groupstats    |
| wikidatawiki.wb_items_per_site       |
| wikidatawiki.wbt_item_terms          |
| wikidatawiki.wbt_property_terms      |
| wikidatawiki.wbt_term_in_lang        |
| wikidatawiki.wbt_text                |
| wikidatawiki.wbt_text_in_lang        |
| wikidatawiki.wbt_type                |
+--------------------------------------+

Mentioned in SAL (#wikimedia-operations) [2019-10-11T06:13:20Z] <marostegui> Compress tables on db2085:3318 - T232446

Marostegui updated the task description. (Show Details)Oct 11 2019, 6:13 AM
Marostegui updated the task description. (Show Details)Oct 14 2019, 8:00 AM

Mentioned in SAL (#wikimedia-operations) [2019-10-15T06:44:20Z] <marostegui@cumin1001> dbctl commit (dc=all): 'Depool db2086:3318 T232446', diff saved to https://phabricator.wikimedia.org/P9344 and previous config saved to /var/cache/conftool/dbconfig/20191015-064419-marostegui.json

Marostegui updated the task description. (Show Details)Oct 17 2019, 12:48 PM
Reedy renamed this task from Compress new Wikibase tables to Compress new Wikibase tables.Oct 17 2019, 12:56 PM
jcrespo removed jcrespo as the assignee of this task.Oct 21 2019, 10:16 AM
Marostegui updated the task description. (Show Details)Tue, Nov 12, 11:08 AM

Why this is created in s4? Does anyone knows why?

Why this is created in s4? Does anyone knows why?

That was part of the Commons Structured Data project.
We advised against doing this, and rather, creating a new section (with new servers) just for those, to be able to accommodate growth and capacity planning as needed, as there were some concerns about the growth and usage, specially having those tables in such a busy and loaded section as Commonswiki, unfortunately, the creation of those tables were not communicated to us, and we actually found out about them by chance.
We later found out that also SDC cannot work with a separate Wikibase installation (meaning those specific set of tables) being stored on different servers.
Right now, I don't know whether this is being worked on or not, so we can eventually migrate all the data to a different set of servers because at some point, we'll need to split it out due to disk space issues.

Why this is created in s4? Does anyone knows why?

That was part of the Commons Structured Data project.
We advised against doing this, and rather, creating a new section (with new servers) just for those, to be able to accommodate growth and capacity planning as needed, as there were some concerns about the growth and usage, specially having those tables in such a busy and loaded section as Commonswiki, unfortunately, the creation of those tables were not communicated to us, and we actually found out about them by chance.
We later found out that also SDC cannot work with a separate Wikibase installation (meaning those specific set of tables) being stored on different servers.

I don't know their codebase very well but on our side (which stores and uses the most of stuff SDC use), we can safely move to another server, we don't do any joins with other tables in the code.

on our side (which stores and uses the most of stuff SDC use), we can safely move to another server, we don't do any joins with other tables in the code.

It was actually wikibase team that said to us said it was not yet possible, please talk to them to clarify.

on our side (which stores and uses the most of stuff SDC use), we can safely move to another server, we don't do any joins with other tables in the code.

It was actually wikibase team that said to us said it was not yet possible, please talk to them to clarify.

I'm in the wikibase team. Can you tell me who said it and where, maybe I'm missing something? Technically it's not possible but it's just matter of sending proper connection to the class and that's all.

I'm in the wikibase team. Can you tell me who said it and where, maybe I'm missing something? Technically it's not possible but it's just matter of sending proper connection to the class and that's all.

I didn't talk directly with them (you), and I don't remember who was exactly- I just remember asking SDC devs, and they relayed it to someone at wikibase. I think it was because Commonswiki has both wikidata AND SDC access separatelly, and needs to be multiplexed (but I am not sure). Please ask around or ask Mark Holmquist, who we contacted from SDC. Sorry I don't have more info. I am forwarding you the emails I got if that helps.

To bring some clarity on who said what, I believe the statement that SDC cannot work with Wikibase-related tables being on different server to which the discussion above refers to was made by myself in T68108#5268031.
As far as I can tell, the statement made there still holds with the normalized wb_terms, and what also holds that make necessary adjustments is not something that would take 6 months if it was meant to be done (as @Ladsgroup states in T232446#5660095).

Forgive my inabilities to read the DBA speak, but if the size and space usage data provided above suggest that wbt_* tables of commonswiki contain significant amount of data, that is certainly something which is not intended to happen. To my best knowledge, commons (WikibaseMediaInfo) has not been storing any "terms" data in their Wikibase tables.

Right now, I don't know whether this is being worked on or not, so we can eventually migrate all the data to a different set of servers because at some point, we'll need to split it out due to disk space issues.

As far as I know it is not being worked on. If what is now expected is to have DB tables of Commons split in a way that revision (and possibly other) "core" MW tables are on one server, and Wikibase related ones on different, that should be consulted with the SDC, and WMDE would be looking into ways to help with making this happen. Including @MarkTraceur to make sure this is on his radar, should any changes needed to happen.

Let me clarify one thing here, by not possible I mean, it will be possible after one day of work and two days of test.

I think it should be said that right now we are not in a super urgent position to get things cleaned up/split/moved or something similar. But we need to make sure we are aware that at some point commonswiki will no longer fit in our servers, specially the old ones (which are scheduled for resfresh in 2 years.
It would be good to know how much we could can do from a code point of view (splits etc) in order to know if there is margin there or the only possibility is to scale by buying bigger disks.

Here we can see the situation with some of the hosts that only host s4 (commonswiki):

db1138.eqiad.wmnet
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   4.4T  2.4T  2.0T  55% /srv
db1121.eqiad.wmnet
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   3.6T  2.6T  1.1T  70% /srv
db1091.eqiad.wmnet
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   3.6T  2.3T  1.4T  64% /srv
db1084.eqiad.wmnet
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   3.6T  2.4T  1.3T  65% /srv
db1081.eqiad.wmnet
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   3.6T  2.8T  912G  76% /srv

Those which are 3.6T are the older ones I mention.
There is probably still enough room to compress data, optimize, once we have deleted columns from revision we'll free up more space etc.

The point I want to make is basically that we should think ahead, as right now we are "okish" but time flies, and the predictions are already higher than expected: T226093#5649354

Marostegui updated the task description. (Show Details)Wed, Nov 20, 9:45 AM
Marostegui updated the task description. (Show Details)Wed, Nov 20, 9:50 AM
Marostegui updated the task description. (Show Details)Wed, Nov 20, 9:59 AM
Marostegui updated the task description. (Show Details)Wed, Nov 20, 10:15 AM
Marostegui updated the task description. (Show Details)Thu, Nov 21, 5:48 AM
Marostegui updated the task description. (Show Details)Thu, Nov 21, 5:51 AM
Marostegui updated the task description. (Show Details)Mon, Nov 25, 6:20 AM
Marostegui updated the task description. (Show Details)Tue, Nov 26, 2:01 PM
Marostegui updated the task description. (Show Details)Wed, Nov 27, 5:43 AM
Marostegui updated the task description. (Show Details)Wed, Nov 27, 7:09 AM
Marostegui updated the task description. (Show Details)Thu, Nov 28, 8:56 AM
Marostegui updated the task description. (Show Details)Thu, Nov 28, 10:38 AM
Marostegui updated the task description. (Show Details)Thu, Nov 28, 1:41 PM
Marostegui updated the task description. (Show Details)Thu, Nov 28, 2:35 PM
Marostegui updated the task description. (Show Details)Thu, Nov 28, 3:40 PM
Marostegui updated the task description. (Show Details)Fri, Nov 29, 6:49 AM
Marostegui updated the task description. (Show Details)Thu, Dec 5, 5:45 AM