Disk space especially in eqiad is getting very tight, with basically no margin for operations like re-shaping the cluster (ex: T121535). To avoid running out of disk space, we are currently holding back features like pre-generation of mobile HTML used by the Android app. We are also spending extra time on juggling the tight space by manually cancelling space-consuming compactions & fine-tuning the load distribution across nodes.
T119659 is adding a third SSD to each of restbase1007-9, thus bringing them up to the capacity of restbase1001-6. This will help slightly in the short term, but only brings us up to par with codfw. We will need more disk space headroom in the longer term.
The upgrade to Cassandra 2.1.12 (T120803) and the multi-instance setup (T95253) have significantly increased the amount of data a single hardware node can support. This means that we can increase our storage capacity by adding SSDs to existing nodes, using the eight available 2.5" SSD bays per chassis.
To minimize upgrade overheads, we are proposing to add 2x 1TB SSDs to each of the nine eqiad nodes, bringing them to five SSDs each. In codfw, we can reach almost identical capacity by adding 3 extra SSDs in each of the six hardware nodes.
Combined with efficiency improvements planned in T120171 for next quarter, this capacity should be sufficient to support all projects planned for this fiscal year.