We're at the point where disk space on titan hosts is not enough for certain kinds of thanos-compact operations, i.e. the compactor runs out of space
Mar 02 04:50:45 titan2001 thanos-compact[641856]: level=error ts=2024-03-02T04:50:45.868960443Z caller=compact.go:487 msg="critical error detected; halting" err="compaction: group 0@10531109435386935375: compact blocks [/srv/thanos-compact/compact/0@10531109435386935375/01HQ15VXWX3H8ZN0CDZKHJ6QMA/srv/thanoscompact/compact/0@10531109435386935375/01HQ25TA9WT5N78X476Y9SS4KG /srv/thanos-compact/compact/0@10531109435386935375/01HQ78BNP20PGX5SGDJEGJ914A /srv/thanoscompact/compact/0@10531109435386935375/01HQCHHWRCDXYQ5VQFN9GN7X45 /srv/thanoscompact/compact/0@10531109435386935375/01HQF980AEFA46FRSWA286ZH7Z /srv/thanoscompact/compact/0@10531109435386935375/01HQMR4F0Q73R9T9N6V7TCS4QC /srv/thanoscompact/compact/0@10531109435386935375/01HQSZFWAN1A8Q01CERPR0ACK5]: 2 errors: populate block: add series: write series data: write /srv/thanoscompact/compact/0@10531109435386935375/01HQYNSETD2HYVH41WE8EPE1NB.tmp-for-creation/index: no space left on device; write /srv/thanoscompact/compact/0@10531109435386935375/01HQYNSETD2HYVH41WE8EPE1NB.tmp-for-creation/index: no space left on device"
We have requested additional SSDs for all titan hosts as part of next year's capex, though it looks like we need to speed up. I'll ask dcops in codfw if they have a couple of big SSDs we can temporarily install