Page MenuHomePhabricator

RAID-0 volume not mounted on restbase-dev1001.eqiad.wmnet
Closed, ResolvedPublic

Description

restbase-dev1001.eqiad.wmnet was recently reimaged, but the RAID-0 storage volume is not mounted as /srv.

$ sudo mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Fri Mar 17 16:19:52 2017
     Raid Level : raid0
     Array Size : 3001561088 (2862.51 GiB 3073.60 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Fri Mar 17 16:19:52 2017
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : restbase-dev1001:2  (local to host restbase-dev1001)
           UUID : 6cb3c9b4:3e414383:fa0ad4c8:f31c11c1
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
$ sudo lvdisplay 
  --- Logical volume ---
  LV Path                /dev/restbase-dev1001-vg/srv
  LV Name                srv
  VG Name                restbase-dev1001-vg
  LV UUID                c1VGGl-ecYY-FxKJ-CKuH-V6xI-1wJU-fdnhW4
  LV Write Access        read/write
  LV Creation host, time restbase-dev1001, 2017-01-04 22:52:55 +0000
  LV Status              available
  # open                 0
  LV Size                2.80 TiB
  Current LE             732802
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:0
$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             10M     0   10M   0% /dev
tmpfs           9.5G  145M  9.3G   2% /run
/dev/md0         28G   21G  5.9G  78% /
tmpfs            24G     0   24G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            24G     0   24G   0% /sys/fs/cgroup
$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/md0 during installation
UUID=92bdaabe-2038-43a9-bcc3-2d22e6d5b9cf /               ext4    errors=remount-ro 0       1
# swap was on /dev/md1 during installation
UUID=aaa7fbae-da3b-43da-8c99-622e62463b7c none            swap    sw              0       0

If it does not prove too difficult to do so, I'd like the opportunity to move the contents of /srv/cassandra-{a,b} to the volume (and/or decommission the two instances).

Event Timeline

Eevans triaged this task as Medium priority.
Eevans edited projects, added Services (done); removed Services (doing).
Eevans added a subscriber: elukey.

This should be done; I did the following:

  • Brought down Cassandra, and masked the systemd units
  • Reformatted /dev/restbase-dev1001-vg/srv ext4, and mounted as /mnt
  • Rsync'd /srv/deployment and /srv/cassandra-{a,b} to /mnt
  • Removed /srv/deployment and /srv/cassandra-{a,b}
  • Mounted /dev/restbase-dev1001-vg/srv as /srv
  • Added an entry to /etc/fstab
  • Unmasked Cassandra and restarted

@Eevans thansk a lot for the details, I had no idea that these manual steps should have been done (I thought that partman would have created everything).

Maybe worth to check partman's recipe and/or to update documentation?

@Eevans thansk a lot for the details, I had no idea that these manual steps should have been done (I thought that partman would have created everything).

@elukey The imaging process is pretty opaque to me, but I assumed this would all be done automatically as well. I only proceeded with this manual process after determining that fstab wasn't under Puppet management.

Maybe worth to check partman's recipe and/or to update documentation?

That sounds like a good idea, reopening...

[ ... ]

Maybe worth to check partman's recipe and/or to update documentation?

That sounds like a good idea, reopening...

And maybe the easiest would be to see if @fgiunchedi can shed any light on this (I think he is the one that set this all up originally).

[ ... ]

Maybe worth to check partman's recipe and/or to update documentation?

That sounds like a good idea, reopening...

And maybe the easiest would be to see if @fgiunchedi can shed any light on this (I think he is the one that set this all up originally).

Yes I can confirm the raid setup (both for / and /srv) is handled by partman at reimage time. I'm not sure why in this case it didn't work though, as other restbase-dev machines were installed without manual intervention

Yes I can confirm the raid setup (both for / and /srv) is handled by partman at reimage time. I'm not sure why in this case it didn't work though, as other restbase-dev machines were installed without manual intervention

We can keep an eye on it next time; Thanks @fgiunchedi!