RAID-0 volume not mounted on restbase-dev1001.eqiad.wmnet
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Eevans
	Apr 10 2017, 3:33 PM

Description

restbase-dev1001.eqiad.wmnet was recently reimaged, but the RAID-0 storage volume is not mounted as /srv.

$ sudo mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Fri Mar 17 16:19:52 2017
     Raid Level : raid0
     Array Size : 3001561088 (2862.51 GiB 3073.60 GB)
   Raid Devices : 4
  Total Devices : 4
    Persistence : Superblock is persistent

    Update Time : Fri Mar 17 16:19:52 2017
          State : clean 
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

     Chunk Size : 512K

           Name : restbase-dev1001:2  (local to host restbase-dev1001)
           UUID : 6cb3c9b4:3e414383:fa0ad4c8:f31c11c1
         Events : 0

    Number   Major   Minor   RaidDevice State
       0       8        3        0      active sync   /dev/sda3
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3

$ sudo lvdisplay 
  --- Logical volume ---
  LV Path                /dev/restbase-dev1001-vg/srv
  LV Name                srv
  VG Name                restbase-dev1001-vg
  LV UUID                c1VGGl-ecYY-FxKJ-CKuH-V6xI-1wJU-fdnhW4
  LV Write Access        read/write
  LV Creation host, time restbase-dev1001, 2017-01-04 22:52:55 +0000
  LV Status              available
  # open                 0
  LV Size                2.80 TiB
  Current LE             732802
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:0

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev             10M     0   10M   0% /dev
tmpfs           9.5G  145M  9.3G   2% /run
/dev/md0         28G   21G  5.9G  78% /
tmpfs            24G     0   24G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs            24G     0   24G   0% /sys/fs/cgroup

$ cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
# / was on /dev/md0 during installation
UUID=92bdaabe-2038-43a9-bcc3-2d22e6d5b9cf /               ext4    errors=remount-ro 0       1
# swap was on /dev/md1 during installation
UUID=aaa7fbae-da3b-43da-8c99-622e62463b7c none            swap    sw              0       0

If it does not prove too difficult to do so, I'd like the opportunity to move the contents of /srv/cassandra-{a,b} to the volume (and/or decommission the two instances).

Event Timeline

Eevans created this task.Apr 10 2017, 3:33 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 10 2017, 3:33 PM

Eevans claimed this task.Apr 10 2017, 3:55 PM

Eevans triaged this task as Medium priority.

• mobrovac edited projects, added Services (blocked); removed Services.Apr 10 2017, 3:57 PM

Eevans edited projects, added Services (doing); removed Services (blocked).Apr 10 2017, 3:58 PM

This should be done; I did the following:

Brought down Cassandra, and masked the systemd units
Reformatted /dev/restbase-dev1001-vg/srv ext4, and mounted as /mnt
Rsync'd /srv/deployment and /srv/cassandra-{a,b} to /mnt
Removed /srv/deployment and /srv/cassandra-{a,b}
Mounted /dev/restbase-dev1001-vg/srv as /srv
Added an entry to /etc/fstab
Unmasked Cassandra and restarted

@Eevans thansk a lot for the details, I had no idea that these manual steps should have been done (I thought that partman would have created everything).

Maybe worth to check partman's recipe and/or to update documentation?

In T162614#3170616, @elukey wrote:

@Eevans thansk a lot for the details, I had no idea that these manual steps should have been done (I thought that partman would have created everything).

@elukey The imaging process is pretty opaque to me, but I assumed this would all be done automatically as well. I only proceeded with this manual process after determining that fstab wasn't under Puppet management.

Maybe worth to check partman's recipe and/or to update documentation?

That sounds like a good idea, reopening...

In T162614#3171494, @Eevans wrote:

In T162614#3170616, @elukey wrote:

[ ... ]

Maybe worth to check partman's recipe and/or to update documentation?

That sounds like a good idea, reopening...

And maybe the easiest would be to see if @fgiunchedi can shed any light on this (I think he is the one that set this all up originally).

In T162614#3171504, @Eevans wrote:

In T162614#3171494, @Eevans wrote:

In T162614#3170616, @elukey wrote:

[ ... ]

Maybe worth to check partman's recipe and/or to update documentation?

That sounds like a good idea, reopening...

And maybe the easiest would be to see if @fgiunchedi can shed any light on this (I think he is the one that set this all up originally).

Yes I can confirm the raid setup (both for / and /srv) is handled by partman at reimage time. I'm not sure why in this case it didn't work though, as other restbase-dev machines were installed without manual intervention

In T162614#3174383, @fgiunchedi wrote:

Yes I can confirm the raid setup (both for / and /srv) is handled by partman at reimage time. I'm not sure why in this case it didn't work though, as other restbase-dev machines were installed without manual intervention

We can keep an eye on it next time; Thanks @fgiunchedi!

RAID-0 volume not mounted on restbase-dev1001.eqiad.wmnetClosed, ResolvedPublicActions

Description

Event Timeline

RAID-0 volume not mounted on restbase-dev1001.eqiad.wmnet
Closed, ResolvedPublic
Actions