Page MenuHomePhabricator

glamtools using increasing amounts of space on NFS
Closed, ResolvedPublic

Description

It seems that glamtools is using increasing amounts of space in the viewdata directory in Toolforge.
eg:

2.3G	201906
2.4G	201907
2.6G	201908
3.2G	201909
3.3G	201910
3.4G	201911
3.5G	201912
3.7G	202001
3.8G	202002
4.2G	202003
4.3G	202004
4.4G	202005
6.2G	202006
6.3G	202007
6.8G	202008
4.7G	202009
7.6G	202010
7.8G	202011
8.5G	202012
9.0G	202101
9.5G	202102
9.9G	202103
7.1G	202104
12G.       202105

We are trying to find places to clean up the NFS because space is getting tight. The tool is overall using 182G of space in viewdata. Can you please clean things up where it is no longer required? If there is any way to slow down the growth of newer data, that would be good as well. I'm not sure why it is getting bigger as dates get later.

Event Timeline

The collected data (from the BaGLAMa2 tool) is quite valuable for GLAM, including the historic ones, which is why I try to keep all of it.

If there is a better (and replicated/backed-up) storage location on Toolforge, I'll be happy to move to that. Please let me know where.
(Yes, I have tried MySQL. I'll go back there is you insist, even though it does not work well with parallel data generation due to DB connection limit)

There is one catalog that sticks out in size: Armenia
https://glamtools.toolforge.org/baglama2/#gid=494&month=202105
This catalog takes up 3.1GB for May 2021. The next one in size is 0.6GB. I can remove that one (for all months), if system resources require it.

Turns out that single catalog takes ~21GB:

-rw-r--r-- 1 tools.glamtools tools.glamtools 1.7G Jul  7  2020 202006/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 1.7G Aug  5  2020 202007/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 1.9G Oct 29  2020 202008/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 1.9G Nov  9  2020 202010/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 1.9G Dec 10  2020 202011/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 2.5G Jan 12 23:13 202012/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 2.8G Feb 14 15:20 202101/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 2.8G Mar 15 08:26 202102/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 2.9G Apr 14 19:42 202103/494.sqlite3
-rw-r--r-- 1 tools.glamtools tools.glamtools 3.1G Jun 15 02:28 202105/494.sqlite3

The shared mysql instance would definitely not be preferable to the shared NFS. We've made good headway with cleaning up other things. Thanks for looking.