Page MenuHomePhabricator

Investigate alternative RAID strategies for labstore1001/2
Closed, InvalidPublic

Description

labstore1001/2 each have 24 disks (1.819TB) attached storage via external storage. This means with RAID 10 (which has been the choice so far), and having one server as secondary with DRBD replication setup, we'll effectively get ~20T of storage.

The current plan is to move scratch and maps to this cluster - which would take 3T and 6T, and leave about 11T available.

We've discussed trying to squeeze some more storage out of this setup for 1. Also moving misc over to this to have more space available for tools, or tool specific shares like paws in the secondary cluster, 2. Offering a non-nfs experimental backup solution to users, to store things like database dumps, for which nfs isn't a good solution, but there are no other existing alternatives.

Some possibilities for maximizing space (RAID levels nicely explained here - https://i.dell.com/sites/doccontent/shared-content/data-sheets/Documents/perc-technical-guidebook.pdf)

(All numbers calculated using https://www.servethehome.com/raid-calculator/ with 24 disks, 1.82 TB each, 43.7 TB overall raw storage)

  1. Alternative RAID levels
    • RAID 5 (Striping with distributed parity): 38TB usable storage, tolerates 1 disk failure, slower write performance (than RAID 10)
    • RAID 6 (Striping with dual distributed parity): 36.4TB usable storage, tolerates 2 disk failures, write performance takes more of a hit due to 2 parity bits being generated for every write.
    • RAID 50 (Striping with RAID 5): 36.4TB usable storage, tolerates 2 disk failures - 1 per RAID set, better write performance than RAID 5 because of fewer disk reads per parity calculation
    • RAID 60 (Striping with RAID 6): 33.1 TB usable storage, tolerates 4 disk failures - 1 per RAID set, better performance than RAID 6 due to fewer disk reads for parity calculation, but slower than RAID 50.
  1. Using internal drives - each of the servers has an internal drive - with 12 disks each of 1.819 TB each. With RAID 10 - that's about 10TB of usable storage, and even leaving 50% for OS, logs etc - we could may be get an added 5TB in each server from using the internal storage on these boxes.

Event Timeline

If performance allows it would be great to get RAID 50 esp since this is a 2 node HA cluster. We could finally do the beginnings of real (but limited) user backups.

Update: I have reimaged labstore1001 and labstore1002 with RAID 50 for the external shelf storage.

These are going to be decommissioned just as soon as we get labstore1008/1009 online.