Defragment db1015
Closed, ResolvedPublic

Description

db1015 is reaching around 93% of /srv/ usage
This server is going to be decommissioned at some point (T148078), but in order to avoid it paging during the holidays period let's try to defragment some of the big tables across all the wikis to give us back some space.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 20 2016, 7:21 AM

The first iteration over the pagelinks pages (smaller than 1G) was done. It gave us back around 25G.
I am now going to proceed with the pagelinks tables bigger than 1G which are 50 of them. I believe that will give us back around 30-40G in total.

Once that is done I will proceed with another big table across all the wikis.
Even though optimize is supposed to be an online DDL operation I am being careful and leaving some seconds between iterations to make sure the server doesn't lag.
During the whole night the alters have been running the server has not lagged (those two spikes I believe are drawing errors - at 14:25 there was no ALTERs running and same goes for 6:37) : https://grafana-admin.wikimedia.org/dashboard/db/mysql?from=1482132327144&to=1482218727144&var-dc=eqiad%20prometheus%2Fops&var-server=db1015&panelId=6&fullscreen

Marostegui edited the task description. (Show Details)Dec 20 2016, 7:31 AM

We got around 50G back from optimizing pagelinks.

root@db1015:/srv/sqldata# df -hT /srv/
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.4T  194G  88% /srv

I have started another loop to go over the revision tables now.

Mentioned in SAL (#wikimedia-operations) [2016-12-20T16:26:44Z] <marostegui> Running optimize table on db1045 for the revision tables as we urgently need some space back on that host - T153739

Mentioned in SAL (#wikimedia-operations) [2016-12-21T07:20:42Z] <marostegui> Running optimize table on db1045 for the revision tables as we urgently need some space back on that host - https://phabricator.wikimedia.org/T153739

It finished optimizing the tables under 1G I will now run the loop to optimize the biggest revision tables (around 5G or so the average, but there are 53 of them).
There have been lo delays during the whole process: https://grafana-admin.wikimedia.org/dashboard/db/mysql?from=now-24h&to=now&var-dc=eqiad%20prometheus%2Fops&var-server=db1015&panelId=6&fullscreen

Mentioned in SAL (#wikimedia-operations) [2016-12-21T07:20:42Z] <marostegui> Running optimize table on db1045 for the revision tables as we urgently need some space back on that host - https://phabricator.wikimedia.org/T153739

s/db1045/db1015

Marostegui added a comment.EditedDec 21 2016, 8:52 AM

Thanks! - I thought just about defragmenting a couple of tables across the board to give us enough space just for the holidays given that we cannot depool slaves during this week.
But this is indeed the long/medium solution until they get decommissioned.

root@db1015:/srv/sqldata# df -hT /srv/
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.4T  236G  86% /srv

I have started the templatelinks (bigger than 1G) tables (but left aside the 121G table from cebwiki) for the sake of doing it today.

It finished and now running across all the smaller tables over night.

root@db1015:~# df -hT /srv/
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.4T  254G  85% /srv

db1015 should be good to go for the holidays

root@db1015:/srv/sqldata# df -hT /srv/
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.4T  266G  84% /srv
root@db1015:/srv/sqldata# df -hT /srv/
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.5T  147G  91% /srv
root@db1015:/srv/sqldata# find -type f -exec du -Sh {} + | sort -rh | head -n 5
208G    ./cebwiki/templatelinks.ibd
63G     ./enwikivoyage/text.ibd
21G     ./shwiki/logging.ibd
18G     ./cebwiki/externallinks.ibd
17G     ./shwiki/pagelinks.ibd

Going to start with those (not with cebwiki/templatelinks yet, as there is not currently enough disk space to alter that table) so we do not get paged during the summit.
Anyways, maybe we can decommission this host (s3 - rc service) once we are back from summit?

Mentioned in SAL (#wikimedia-operations) [2017-01-03T08:00:31Z] <marostegui> Run optimize table on a few large tables - db1015 - T153739

Mentioned in SAL (#wikimedia-operations) [2017-01-04T07:43:56Z] <marostegui> Compressing tables on db1015 - T153739

Compressing top biggest tables on this host to get some extra space.

Mentioned in SAL (#wikimedia-operations) [2017-01-05T07:08:54Z] <marostegui> Compressing revision tables across all the wikis - db1015 - T153739

Mentioned in SAL (#wikimedia-operations) [2017-01-19T10:14:34Z] <marostegui> Compressing templatelinks tables on db1015 - T153739

Marostegui triaged this task as "Normal" priority.Jan 19 2017, 10:14 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2017-01-20T07:09:27Z] <marostegui> Compress pagelinks tables on db1015 - T153739

Mentioned in SAL (#wikimedia-operations) [2017-01-23T07:28:33Z] <marostegui> Compressing cebwiki.templatelinks on db1015 (224G table) - T153739

Marostegui closed this task as "Resolved".Jan 26 2017, 11:04 AM

Should be good for now

root@db1015:/srv/sqldata# df -hT /srv
Filesystem            Type  Size  Used Avail Use% Mounted on
/dev/mapper/tank-data xfs   1.6T  1.1T  536G  67% /srv