Page MenuHomePhabricator

Fix partition scheme on Cassandra test hosts to allow testing on SSDs
Closed, ResolvedPublic

Description

Following up to https://rt.wikimedia.org/Ticket/Display.html?id=8529:

I was just starting to test on these hosts, and noticed that the RAID-0 has a mix of the SSDs and spinning disks. The root FS also seems to be on the RAID-0, instead of a more conventional RAID-1 on spinning disks.

It would be great to change this so that only the SSDs are in the RAID-0, so that we can test with SSD storage.

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a project: acl*sre-team.
GWicke changed Security from none to None.
GWicke removed a project: RESTBase-architecture.
GWicke added subscribers: GWicke, akosiaris.
GWicke triaged this task as High priority.Dec 3 2014, 6:56 PM
GWicke added a project: Scrum-of-Scrums.
GWicke moved this task from Scheduled to Blocked on the Scrum-of-Scrums board.
GWicke moved this task from Backlog to Blocked / others on the RESTBase board.

So, cerium has been reinstalled with spinning disks support /boot, /, swap etc. SSDs are under /dev/md2 formatted as ext4. I thought about mounting it under /var/lib/cassandra but then decided against it until @GWicke gives a thumbs up, after which I will do the other 2 boxes in a rotation, in accordance with an IRC talk we had

GWicke added a comment.EditedDec 4 2014, 6:26 PM

@akosiaris: Thanks for converting cerium! The cassandra node joined the cluster without issues & RESTBase is running happily, so that's awesome. Will make it easy to add more nodes to the cluster later ;) Also, all this happened while dumping enwiki through restbase, without the cluster having any downtime or data loss.

Re mounting: Previously I converted /var/lib/cassandra into a symlink to /mnt/data/cassandra (mount point being /mnt/data). An advantage with that scheme is that it lets us use the partition for other data too. It will also prevent a cassandra startup if the partition is not mounted, although making /var/lib/cassandra root-owned, but then making cassandra own the root of the data partition mounted on top of it should have the same effect.

I'm happy with whichever option is easier to generalize in puppet & fails loudly if the mount is missing (instead of filling up a small root fs).

@akosiaris: I went ahead and did the symlink thing manually on cerium. Could you do the other nodes next? There is no real need to do them one by one (apart from keeping the cluster running for testing), so if it's faster to do both remaining ones in parallel then please do so. Thanks!

akosiaris closed this task as Resolved.Dec 5 2014, 6:56 PM
akosiaris claimed this task.

@GWicke, done. All three boxes are now reinstalled with the exact same configuration as cerium. I have done the manual symlink and cassandra seems to be working fine (my criteria is that it has started and not stopped nor logged something ominus to /var/log/cassandra.log)

Marking my first phabricator ticket as resolved!!!

@akosiaris: I checked back an hour ago after continuing to test with cerium over the day, and discovered that the other two nodes were already back at work. Perfect, thanks a lot!

First results with all three nodes look very promising (see T76370).

bd808 moved this task from Blocked to Done on the Scrum-of-Scrums board.Dec 9 2014, 10:42 PM