This is in response to: T235425: webperf*002 running out of disk space (arc lamp, xhgui).
In T199853 we found that, with the current retention rules of 90 days (daily) and 14 days (hourly), we required a fairly stable amount of disk space: about 25G, and this hadn't changed over several months.
While overall backend traffic might increase over time, we we had a buffer of 150G in addition to that 25G (reserved for XHGui profiles for T180761, but we haven't gotten to that yet). In addition to having that buffer, there is also the plan to move storage of this off of the local disk and into Swift (T200108) and to increase retention much further (preferably a year at least).
But, against all expectations, we are now in a situation where the same retention rates are taking up 4X as much (~105G instead of ~25G).
I assume this due to the php7-excimer sampling interval being much lower than it was with hhvm-xenon. I now realise this was mentioned by Tim beforehand at T205059, and I also observed this anecdotally during the HHVM-PHP7 migration at T187154#5471414.
Some ideas of what we could do (need one, or more, of the following probably)
- Decrease the php7-excimer sampling interval?
- Shorten the Arc Lamp retention span? – This would go against our plan to increase the retention span. Its short length is already limiting its usefulness to investigate problems - T200108.
- Increase disk space on the webperf*002 Ganeti VMs? – Was previously denied, at T199853.
- Implement support in Arc Lamp for compressed trace files ("logs"). – Even with compression, we'd still store 2X as much as before, but we'd be well within the disk space available, so no problem; That is, until we migrate XHGui to this server – T180761.
- Expedite migration to let Arc Lamp store (older) logs in Swift and/or migrate them by other means transparently to Ac Lamp (e.g. for files unchanged for more than 1 day, upload to Swift and somehow overlay or rewrite the static file server)