Page MenuHomePhabricator

deployment-logstash2 out of disk space
Closed, ResolvedPublic

Description

I wanted to look at some beta logs but it looks like deployment-logstash2 ran out of disk space

$ ssh deployment-logstash2.eqiad.wmflabs df -h
Filesystem                          Size  Used Avail Use% Mounted on
udev                                 10M     0   10M   0% /dev
tmpfs                               3.2G  322M  2.9G  11% /run
/dev/vda3                            19G  3.4G   15G  19% /
tmpfs                               7.9G     0  7.9G   0% /dev/shm
tmpfs                               5.0M     0  5.0M   0% /run/lock
tmpfs                               7.9G     0  7.9G   0% /sys/fs/cgroup
/dev/mapper/vd-second--local--disk  139G  131G 1010M 100% /mnt

Event Timeline

Based on https://tools.wmflabs.org/sal/production?p=0&q=logstash&d= and https://tools.wmflabs.org/sal/releng?p=0&q=logstash&d= I'm adding some CC's here of the people who de facto maintain logstash (I know it's in the weird world of odd maintainership).

Help please, @Gehel @EBernhardson

greg triaged this task as High priority.Jul 13 2017, 4:07 PM

Index sizes:

yellow open logstash-2017.06.13 1w0X6t80SASEnV6Y1kDTUw 1 2  1770573 0 924.8mb 924.8mb
yellow open logstash-2017.06.14 cAUWl9U2QYefGMsi07sW4w 1 2  1373470 0   802mb   802mb
yellow open logstash-2017.06.15 Pzi-_l4OTim8gVkwmrgE8Q 1 2  1320566 0 775.4mb 775.4mb
yellow open logstash-2017.06.16 oOx1g6lAReuJtu1S05D1-g 1 2  1847407 0     1gb     1gb
yellow open logstash-2017.06.17 g1PAlyahTjOQERO7HzWRxQ 1 2  7295888 0   4.2gb   4.2gb
yellow open logstash-2017.06.18 h8kIhI2kT_2GsJXeSlrfnw 1 2  7869146 0   4.5gb   4.5gb
yellow open logstash-2017.06.19 agYJsf6bSmaA3kdFg-k3tg 1 2  1933361 0   1.1gb   1.1gb
yellow open logstash-2017.06.20 rTRerGWPSoieG7rPAZwtMA 1 2  2076136 0   1.2gb   1.2gb
yellow open logstash-2017.06.21 mrmOdcwSTYWuUxPnJheCYQ 1 2  2026378 0   1.2gb   1.2gb
yellow open logstash-2017.06.22 sV0qiNgyToqL8bK2nLSmzw 1 2  1630069 0     1gb     1gb
yellow open logstash-2017.06.23 k6OSsrahTcS8Ez9CAKZffw 1 2  1382242 0 862.1mb 862.1mb
yellow open logstash-2017.06.24 d-J4rMHCQwaB36m3Tssavw 1 2  1290315 0 810.9mb 810.9mb
yellow open logstash-2017.06.25 ABSVH7WBS-qRFr-t1E8PjA 1 2  1350035 0 839.2mb 839.2mb
yellow open logstash-2017.06.26 HQDtbzOqQWeB2D4Su14_og 1 2  1421294 0 882.8mb 882.8mb
yellow open logstash-2017.06.27 Qh09PubZSZS_4_CNcOut8Q 1 2  1442996 0 889.3mb 889.3mb
yellow open logstash-2017.06.28 -7GPwTKPTRKN75ppOhciyw 1 2  1385760 0 846.1mb 846.1mb
yellow open logstash-2017.06.29 Me5clpD8Sna_EZ-SA38IfQ 1 2  1489077 0 918.6mb 918.6mb
yellow open logstash-2017.06.30 QLv1WYMwQc6PqJdXyyY6QA 1 2  1727110 0     1gb     1gb
yellow open logstash-2017.07.01 ZpeCbqBUSoisgTk2YkAj3g 1 2  1704719 0     1gb     1gb
yellow open logstash-2017.07.02 7b23IJ0BQr23bt4q8u3M1A 1 2  1728725 0     1gb     1gb
yellow open logstash-2017.07.03 DMyED6p8Su6YaHceCBadew 1 2  1797034 0     1gb     1gb
yellow open logstash-2017.07.04 1dJEoV7uRPamFiupUDTPww 1 2  1761816 0     1gb     1gb
yellow open logstash-2017.07.05 Wyyv1AVCTpCcl9gIsxWOqg 1 2  2033278 0   1.2gb   1.2gb
yellow open logstash-2017.07.06 4GjgpE4hQC25IFRZQcK4gQ 1 2  3069230 0   1.6gb   1.6gb
yellow open logstash-2017.07.07 d3E5KZLwQ3iVfmfaXnxeKA 1 2  2874551 0   1.5gb   1.5gb
yellow open logstash-2017.07.08 kapK5_HTSje9KVvamd7OiQ 1 2  7784277 0   3.7gb   3.7gb
yellow open logstash-2017.07.09 O9ygLV2wQcCSATde4hhksg 1 2 32410293 0  14.3gb  14.3gb

A typical day looks to be 750M to 1.5G. Something on 07.08 - 07.09 started spamming and filled up the disks. For now i'm dropping the oldest weeks worth of data, but if we have another 15G day that's not going to help much.

It looks like there was also 20G of old indexes from a previous version of elasticsearch in /mnt. Clearing that out along with the 7 oldest days of logs has brought us 43G of free space.

ebernhardson@deployment-logstash2:~$ df -h /mnt
Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/vd-second--local--disk  139G   90G   43G  68% /mnt
alex@alex-laptop:~$ ssh deployment-logstash2 df -h /mnt
Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/vd-second--local--disk  139G   41G   92G  31% /mnt

So is this done?

EBernhardson claimed this task.

Varying definitions of done. This particular instance isn't a problem, but there is nothing preventing the same problem occurring in the future.