Reduction of stat1005's disk space usage
Icinga has been alarming for stat1005's disk space usage for the /srv partition. This is the current status:

/dev/mapper/stat1005--vg-data  7.2T  6.4T  385G  95% /srv

elukey@stat1005:/srv$ sudo du -hs *
31G	analytics-wmde
70G	deployment
1.3G	discovery
1.6M	event-schemas
4.1T	home
1.3T	log
16K	lost+found
513M	mediawiki
328M	published-datasets
184M	reportupdater
1.1T	stat1002-a

Top consumers in the /srv/home dir:

31G	ebernhardson
33G	jsamra
34G	nithum
38G	ashwinpp
69G	nuria
73G	dsaez
115G	ellery
147G	mirrys
240G	mkroetzsch
641G	flemmerich
2.5T	ezachte

@Ottomata do you think that we can delete the stat1002-a's dir?

Yeah I think so! Maybe we can just shove it in HDFS for posterity?

Yeah I think so! Maybe we can just shove it in HDFS for posterity?


We are trying to back up everything to HDFS before deleting the /srv/stat1002-a dir, but it would be great to delete data if possible to avoid wasting space:

elukey@stat1005:/srv/stat1002-a/user_dirs_from_stat1002$ sudo du -hs * | sort -h
4.0K	akrausetud
4.0K	arnad
4.0K	ebernhardson
4.0K	jdcc
4.0K	smalyshev
16K	milimetric
20G	nuria
40G	leila
73G	psinger
245G	ellery
655G	halfak

@Halfak, @leila: do you have any idea if we can safely delete this user directories on stat1005? They were the old user home directories on stat1002..

To free some space, now ellery's and psinger's stat1002 user dir are available in /mnt/hdfs/wmf/data/archive/backup/misc/stat1002-a/user_dirs_from_stat1002/

Now we have ~600GB back, 91% partition usage, much better :) If these dirs are not needed anymore it would be great to remove them from hdfs too.

I've dropped my usage to 44GB

Woops. Caught something else I don't need. Down to 12GB

elukey@stat1005:~$ df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/stat1005--vg-data  7.2T  5.6T  1.3T  82% /srv

Way better now, thanks!

@ezachte Hi again :) any news about dropping some backup data?

@elukey I reviewed my share. All of it is related to one project, and I need to keep that data because we still refer to it and need to take subsets out of it from time to time. Is this causing an issue for you or you want to make sure if we don't need space, we don't lock it up?

@leila, if you have just random data that you don't want deleted, but is large, you could put it in your HDFS home directory and save it there :)

just a quick heads-up: I (finally) managed to regain access to stat1005, and will start cleanup tomorrow

Current status of home dirs:

115G    ellery
241G    mkroetzsch
648G    ezachte
754G    flemmerich
782G    dsaez
950G    mirrys

@leila, @Miriam, @diego: do you guys need all that space or can we do some clean ups? Don't want to force you to delete data that you need, only garbage that can be thrown away :)

@ezachte thanks a lot for the massive space reduction! \o/

@elukey: Done ;)

now 66G /home/dsaez/

@flemmerich Please check this thread.

elukey@stat1005:~$ df -h
Filesystem                     Size  Used Avail Use% Mounted on
/dev/mapper/stat1005--vg-data  7.2T  4.2T  2.6T  62% /srv

Much better now, thanks to all! Please keep in mind that regular cleanups are really appreciated by Analytics folks even without any solicit :)

