Page MenuHomePhabricator

maps-warper /mnt vbd partition errored, turned read only and went missing after reboot
Closed, ResolvedPublic

Description

This is on the maps-warper instance on Labs

This is related to T102414 in that I was attempting to move the postgresql data files from the full partition to /mnt to free up some space as it keeps on filling up.

(Note that I added the labs::lvm::srv role but all the extra space I saw was in /mnt rather than in /srv as expected)

So.

Prior to creating new directories and moving files on /mnt, the filesystem was reporting errors in syslog.

Sep 14 12:47:17 maps-warper kernel: [2669113.413301] EXT3-fs error (device vdb): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 273, block = 8945664
Sep 14 12:47:18 maps-warper kernel: [2669114.618342] EXT3-fs error (device vdb): ext3_valid_block_bitmap: Invalid block bitmap - block_group = 274, block = 8978432
Sep 14 13:08:41 maps-warper kernel: [2670397.830058] journal_bmap: journal block not found at offset 1036 on vdb
Sep 14 13:08:41 maps-warper kernel: [2670397.830515] Aborting journal on device vdb.
Sep 14 13:08:53 maps-warper kernel: [2670409.957811] EXT3-fs (vdb): error: ext3_journal_start_sb: Detected aborted journal
Sep 14 13:08:53 maps-warper kernel: [2670409.961066] EXT3-fs (vdb): error: remounting filesystem read-only

After I moved the files, the filesystem was turned into read-only. However it showed as (rw) when I used "mount". Sorry I don't have a log of this as scrollback was wiped when I rebooted.

I was able to copy back the postgresql data directory to it's original location and everything worked ok. Nothing else was on /mnt apart from this, and empty "lost and found" and "keys" directories.

Today I rebooted the instance, but the partition and filesystem is missing when I use "df -h" and "mount"

chippy@maps-warper:~$ df -h
Filesystem                                      Size  Used Avail Use% Mounted on
/dev/vda1                                       3.8G  3.5G   97M  98% /
udev                                            2.0G  8.0K  2.0G   1% /dev
tmpfs                                           396M  268K  396M   1% /run
none                                            5.0M     0  5.0M   0% /run/lock
none                                            2.0G     0  2.0G   0% /run/shm
labstore.svc.eqiad.wmnet:/project/maps/project  6.0T  2.4T  3.6T  41% /data/project
labstore.svc.eqiad.wmnet:/scratch               984G  437G  497G  47% /data/scratch
labstore1003.eqiad.wmnet:/dumps                  44T   13T   31T  30% /public/dumps
labstore.svc.eqiad.wmnet:/project/maps/home     6.0T  2.4T  3.6T  41% /home

I think, but I can't be sure but it seems as if the amount in use in /vda1 was increased by around 20mb than before the reboot.

chippy@maps-warper:~$ mount
/dev/vda1 on / type ext4 (rw)
proc on /proc type proc (rw)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
labstore.svc.eqiad.wmnet:/project/maps/project on /data/project type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc,addr=10.64.37.10,clientaddr=10.68.17.33)
labstore.svc.eqiad.wmnet:/scratch on /data/scratch type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc,addr=10.64.37.10,clientaddr=10.68.17.33)
labstore1003.eqiad.wmnet:/dumps on /public/dumps type nfs (ro,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc,addr=10.64.4.10,clientaddr=10.68.17.33)
labstore.svc.eqiad.wmnet:/project/maps/home on /home type nfs (rw,noatime,vers=4,bg,hard,intr,sec=sys,proto=tcp,port=0,nofsc,addr=10.64.37.10,clientaddr=10.68.17.33)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)

The instance appears to be working normally and serving the application ok.

Your help is appreciated in advance. Ideally this instance could do with more space somehow.

Event Timeline

Chippyy raised the priority of this task from to Needs Triage.
Chippyy updated the task description. (Show Details)
Chippyy added projects: Cloud-Services, Maps.
Chippyy subscribed.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Chippyy set Security to None.
Chippyy updated the task description. (Show Details)

It is crucial to get enough capacity to warping maps before it is possible to use the tool more widely. We expect to be able to do mass uploads to the Warper, but will need to solve this first.

I removed old kernels and uninstalled emacs and installed localepurge to remove unused locales and that freed up a whopping 1.5G so the urgency for free space has diminished. However the filesystem still is not mounting.

Are there Puppet errors? What does the output of sudo puppet agent -tv say?

Yes there does appear to be errors.

sudo puppet agent -tv

chippy@maps-warper:~$ sudo puppet agent -tv
NOTE: Gem.latest_load_paths is deprecated with no replacement. It will be removed on or after 2011-10-01.
Gem.latest_load_paths called from /usr/lib/ruby/vendor_ruby/puppet/util/rubygems.rb:54
.
Info: Retrieving plugin
Info: Loading facts in /var/lib/puppet/lib/facter/physicalcorecount.rb
Info: Loading facts in /var/lib/puppet/lib/facter/root_home.rb
Info: Loading facts in /var/lib/puppet/lib/facter/lldp.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_vardir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/labsproject.rb
Info: Loading facts in /var/lib/puppet/lib/facter/ganeti.rb
Info: Loading facts in /var/lib/puppet/lib/facter/initsystem.rb
Info: Loading facts in /var/lib/puppet/lib/facter/puppet_config_dir.rb
Info: Loading facts in /var/lib/puppet/lib/facter/pe_version.rb
Info: Loading facts in /var/lib/puppet/lib/facter/apt.rb
Info: Caching catalog for maps-warper.maps.eqiad.wmflabs
Info: Applying configuration version '1442940753'
Notice: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: /usr/local/sbin/make-instance-vg: line 5: /sbin/parted: No such file or directory
Notice: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: /usr/local/sbin/make-instance-vg: line 4: /sbin/parted: No such file or directory
Notice: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: /usr/local/sbin/make-instance-vg: failed to create new partition
Error: /usr/local/sbin/make-instance-vg '/dev/vda' returned 1 instead of one of [0]
Error: /Stage[main]/Labs_lvm/Exec[create-volume-group]/returns: change from notrun to 0 failed: /usr/local/sbin/make-instance-vg '/dev/vda' returned 1 instead of one of [0]
Notice: /Stage[main]/Role::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Exec[create-vd-second-local-disk]: Dependency Exec[create-volume-group] has failures: true
Warning: /Stage[main]/Role::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Exec[create-vd-second-local-disk]: Skipping because of failed dependencies
Notice: /Stage[main]/Role::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Mount[/srv]: Dependency Exec[create-volume-group] has failures: true
Warning: /Stage[main]/Role::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Mount[/srv]: Skipping because of failed dependencies
Notice: /Stage[main]/Role::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Labs_lvm::Extend[/srv]/Exec[extend-vd-/srv]: Dependency Exec[create-volume-group] has failures: true
Warning: /Stage[main]/Role::Labs::Lvm::Srv/Labs_lvm::Volume[second-local-disk]/Labs_lvm::Extend[/srv]/Exec[extend-vd-/srv]: Skipping because of failed dependencies
Notice: Finished catalog run in 13.69 seconds

It looks like the package parted is missing. This package was added to the Precise image (as an implicit dependency of ubuntu-standard) before the instance maps-warper was launched (cf. cec6e460743b66d070e2ce015e7c45f64d4d34d9), so probably the package was (manually) uninstalled. I'll submit a patch that states the dependency of labs_lvm on parted explicitly; in the mean time, you could install the package manually with sudo apt-get install parted, and then subsequently Puppet should create the second partition.

Change 240271 had a related patch set uploaded (by Tim Landscheidt):
labs_lvm: Require parted explicitly

https://gerrit.wikimedia.org/r/240271

Change 240271 merged by Dzahn:
labs_lvm: Require parted explicitly

https://gerrit.wikimedia.org/r/240271

Very Many Thanks!

Parted got installed, and the /srv partition now shows up as /dev/mapper/vd-second--local--disk

Should we worry about the errored and missing /mnt partition? /dev/vdb

(Incidentally, I don't remember uninstalling parted. I do note that the Image ID on https://wikitech.wikimedia.org/wiki/Special:NovaInstance says "missing" though)

Chippyy claimed this task.

Okay, I'm assuming that the /mnt filesystem is gone, not actually needed and not important.

The instance has more space now to make up for it. Marking as resolved.