Page MenuHomePhabricator

Separate /var on restbase
Closed, ResolvedPublic

Description

In the current configuration, restbase nodes have a separate /var partition. This presents a risk for the functioning of Cassandra since the same partition is used for both Cassandra's data and all of the logs collected locally. We need to separate them off so that possible partition fills do not push Cassandra to malfunction.

Event Timeline

mobrovac raised the priority of this task from to Needs Triage.
mobrovac updated the task description. (Show Details)
mobrovac added a subscriber: mobrovac.

Change 242098 had a related patch set uploaded (by Filippo Giunchedi):
install_server: cassandra to /srv for 2 ssd hosts

https://gerrit.wikimedia.org/r/242098

we're going to piggyback on multi-instance work for this too, plan is to start with restbase-test2* machines and start converting to multi instance (2x machine since they have 32gb of ram only)

Change 242098 merged by Filippo Giunchedi:
install_server: cassandra to /srv for 2 ssd hosts

https://gerrit.wikimedia.org/r/242098

Dzahn triaged this task as Medium priority.Oct 19 2015, 11:43 PM
Dzahn added a subscriber: Dzahn.
fgiunchedi renamed this task from Separate /var on restbase100x to Separate /var on restbase.Apr 28 2016, 9:28 AM
fgiunchedi updated the task description. (Show Details)
fgiunchedi set Security to None.

eqiad is done, codfw has restbase200[356] to be converted to multi-instance, which will resolve this too

supposedly just moving cassandra's data directory to a different path and use cassandra.replace_address option should just work to effectively move from /var/lib/cassandra to /srv/cassandra-a.

Proposed steps:

  1. systemctl mask cassandra
  2. puppet agent --disable
  3. nodetool drain
  4. reboot in single user mode (root password required)
  5. systemctl stop nfs-common
  6. umount /var
  7. mount /dev/mapper/<lv> /mnt
  8. rsync -vaz /mnt/ --exclude lib/cassandra /var
  9. mv /mnt/lib/cassandra /mnt/cassandra-a
  10. rm -r /mnt/{backups,cache,lib,local,lock,log,lost+found,mail,opt,run,spool,tmp,userarchive}
  11. rsync -vaz /srv/ /mnt/
  12. rm -rf /srv
  13. install -d -o root -g root /srv
  14. lvrename <HOSTNAME>-var <HOSTNAME>-srv
  15. change fstab to reflect /var vs /srv change
  16. reboot
  17. ls -d /srv/deployment/ /srv/cassandra-a
  18. add instance -a to puppet
  19. puppet agent --enable
  20. puppet agent --test

supposedly just moving cassandra's data directory to a different path and use cassandra.replace_address option should just work to effectively move from /var/lib/cassandra to /srv/cassandra-a.

I'm not sure about this part; Did you read something that suggests using cassandra.replace_address is necessary?

Proposed steps:

  1. systemctl mask cassandra
  2. puppet agent --disable
  3. nodetool drain
  4. reboot in single user mode
  5. mount /dev/mapper/<lv> /mnt

I guess /dev/mapper/<lv> isn't mounted at /var in single-user mode (i.e. no unmount is required)?

  1. rsync -vaz /mnt/ --exclude /mnt/lib/cassandra /var
  2. mv /mnt/lib/cassandra /mnt/cassandra-a
  3. rm -r /mnt/{backups,cache,lib,local,lock,log,lost+found,mail,opt,run,spool,tmp,userarchive}
  4. rsync -vaz /srv/ /mnt/
  5. rm -rf /srv
  6. install -d -o root -g root /srv
  7. lvrename <HOSTNAME>-var <HOSTNAME>-srv
  8. change fstab to reflect /var vs /srv change
  9. reboot
  10. ls -d /srv/deployment/ /srv/cassandra-a
  11. add instance -a to puppet
  12. puppet agent --enable
  13. systemctl mask cassandra-a
  14. launch cassandra with replace-address
  15. puppet agent --test
Eevans moved this task from In-Progress to Next on the Cassandra board.

supposedly just moving cassandra's data directory to a different path and use cassandra.replace_address option should just work to effectively move from /var/lib/cassandra to /srv/cassandra-a.

I'm not sure about this part; Did you read something that suggests using cassandra.replace_address is necessary?

I've assumed it'd be necessary since we're moving from the machine main ip address to its -a instance address without going through a decommission

Proposed steps:

  1. systemctl mask cassandra
  2. puppet agent --disable
  3. nodetool drain
  4. reboot in single user mode
  5. mount /dev/mapper/<lv> /mnt

I guess /dev/mapper/<lv> isn't mounted at /var in single-user mode (i.e. no unmount is required)?

I'm not sure but good point, I'll amend the list

supposedly just moving cassandra's data directory to a different path and use cassandra.replace_address option should just work to effectively move from /var/lib/cassandra to /srv/cassandra-a.

I'm not sure about this part; Did you read something that suggests using cassandra.replace_address is necessary?

I've assumed it'd be necessary since we're moving from the machine main ip address to its -a instance address without going through a decommission

AFAIK, cassandra.replace_address is only or the case where you are bootstrapping a new node into the ring using the IP address of a previous, dead node. In this case, it should be enough to start the node up with a different IP to what it previously had (and that did work for me in testing).

supposedly just moving cassandra's data directory to a different path and use cassandra.replace_address option should just work to effectively move from /var/lib/cassandra to /srv/cassandra-a.

I'm not sure about this part; Did you read something that suggests using cassandra.replace_address is necessary?

I've assumed it'd be necessary since we're moving from the machine main ip address to its -a instance address without going through a decommission

AFAIK, cassandra.replace_address is only or the case where you are bootstrapping a new node into the ring using the IP address of a previous, dead node. In this case, it should be enough to start the node up with a different IP to what it previously had (and that did work for me in testing).

sounds good, thanks for the clarification, I've updated the list of steps at https://phabricator.wikimedia.org/T113714#2296627 if that looks good I think we can try that today

supposedly just moving cassandra's data directory to a different path and use cassandra.replace_address option should just work to effectively move from /var/lib/cassandra to /srv/cassandra-a.

I'm not sure about this part; Did you read something that suggests using cassandra.replace_address is necessary?

I've assumed it'd be necessary since we're moving from the machine main ip address to its -a instance address without going through a decommission

AFAIK, cassandra.replace_address is only or the case where you are bootstrapping a new node into the ring using the IP address of a previous, dead node. In this case, it should be enough to start the node up with a different IP to what it previously had (and that did work for me in testing).

sounds good, thanks for the clarification, I've updated the list of steps at https://phabricator.wikimedia.org/T113714#2296627 if that looks good I think we can try that today

LGTM. SGTM.

Mentioned in SAL [2016-05-23T14:44:40Z] <godog> reboot restbase2005 in single user mode for T113714

@fgiunchedi completed the conversion of 2005 to 2005-a (in what looks like ~15 minutes); Everything looks perfect.

Good work @fgiunchedi!

Mentioned in SAL [2016-05-24T08:45:57Z] <godog> reboot restbase2006 for multi-instance conversion T113714

Mentioned in SAL [2016-05-24T09:18:19Z] <godog> reboot restbase2003 for multi-instance conversion T113714

this is completed, all restbase now have standalone /srv

this is completed, all restbase now have standalone /srv

\o/ Thank you @fgiunchedi !

this is completed, all restbase now have standalone /srv

\o/ Thank you @fgiunchedi !

\,,/(^_^)\,,/