Disk space especially in eqiad is getting very tight, with basically no margin for operations like re-shaping the cluster. To avoid running out of disk space, we are currently holding back features like pre-generation of mobile HTML used by the Android app. We are also spending extra time on juggling the tight space by manually cancelling space-consuming compactions & fine-tuning the load distribution across nodes.
T119659 is adding a third SSD to each of restbase1007-9, thus bringing them up to the capacity of restbase1001-6. This will help slightly in the short term, but only brings us up to par with codfw. We will need more disk space headroom in the longer term.
The upgrade to Cassandra 2.1.12 (T120803) has significantly increased the amount of storage a single node can support with reasonable memory and CPU resources. This means that we can increase our storage capacity by adding SSDs to existing nodes. All of the nodes were purchased with eight SSD bays for expansion capacity.
To minimize upgrade overheads, we are proposing to add 2x 1TB SSDs to each of the nine eqiad nodes, bringing them to five SSDs each. In codfw, we can reach almost identical capacity by adding 3 extra SSDs per hardware node.
Combined with efficiency improvements planned in T120171 for next quarter, this capacity should be sufficient to support all projects planned for this fiscal year.