Page MenuHomePhabricator

Problems with NFS shares when launching a mediawiki-vagrant instance in Cloud VPS
Closed, ResolvedPublic

Description

I've been trying to launch a vanilla role::labs::mediawiki_vagrant for T257322: Provision test instance for "Templates" projects, and am stuck at the "Mounting NFS shared folders" step. Behavior is the same with both stretch and buster images.

The error is generally like this,

==> default: Mounting NFS shared folders...                                                                                                       
The following SSH command responded with a non-zero exit status.                                                                                  
Vagrant assumes that this means the command failed!                                                                                               
                                                                                                                                                  
mount -o vers=3,udp,noatime,rsize=32767,wsize=32767,async 10.0.3.1:/srv/mediawiki-vagrant /vagrant                                                
result=$?                                                                                                                                         
if test $result -eq 0; then                                                                                                                       
if test -x /sbin/initctl && command -v /sbin/init && /sbin/init 2>/dev/null --version | grep upstart; then                                        
/sbin/initctl emit --no-wait vagrant-mounted MOUNTPOINT=/vagrant                                                                                  
fi                                                                                                                                                
else                                                                                                                                              
exit $result                                                                                                                                      
fi                                                                                                                                                
                                                                                                                                                  
                                                                                                                                                  
Stdout from the command:                                                                                                                          
                                                 
                                              
                                                        
Stderr from the command:                                                                                                                         
                                                                     
mesg: ttyname failed: Inappropriate ioctl for device                     
mount.nfs: mount to NFS server '10.0.3.1:/srv/mediawiki-vagrant' failed: RPC Error: Unable to receive

One point worth investigating is that the client is trying to use NFSv3 (-o vers=3), but I believe we're only set up to support NFSv4. For example, in T241710 we've disabled rpcbind on the assumption that we can serve NFSv4 only.

Experimenting with commands inside the VM, I discovered:

$ mount.nfs4 -o async 10.0.3.1:/srv/mediawiki-vagrant /vagrant
  -> Succeeds!

However,

$ mount.nfs -o vers=3,async 10.0.3.1:/srv/mediawiki-vagrant /vagrant
mount.nfs: mount to NFS server '10.0.3.1:/srv/mediawiki-vagrant' failed: RPC Error: Unable to receive

$ mount.nfs4 -o udp,noatime,rsize=32767,wsize=32767,async 10.0.3.1:/srv/mediawiki-vagrant /vagrant
mount.nfs4: mount to NFS server '10.0.3.1:/srv/mediawiki-vagrant' failed: RPC Error: Unable to receive

It looks like we need to configure the client to use TCP and NFSv4. Allegedly this can be done via settings[:nfs_shares].

Event Timeline

It's unclear why the client is trying to use vers=3 in the first place. The corresponding setting is already false:

$ vagrant config nfs_force_v3 no
$ vagrant config --get nfs_force_v3
false

This setting controls the line,

./Vagrantfile:    root_share_options[:mount_options] << 'vers=3' if settings[:nfs_force_v3]

I can get the mount to run successfully by making the following change to Vagrantfile. We probably need to keep NFSv3 support for non-Cloud users, so I don't consider this an appropriate patch for upstreaming, yet.

diff --git a/Vagrantfile b/Vagrantfile
index ab7c95fd..a68be4b3 100644
--- a/Vagrantfile
+++ b/Vagrantfile
@@ -136,7 +136,7 @@ Vagrant.configure('2') do |config|
     end
   end
 
-  root_share_options = { id: 'vagrant-root' }
+  root_share_options = { id: 'vagrant-root', nfs_version: 4, nfs_udp: false }
 
   if settings[:nfs_shares]
     root_share_options[:type] = :nfs

I think the fix for this is a combination of changes in MediaWiki-Vagrant to allow forcing the desired NFS settings and a related change in the ::vagrant::mediawiki Puppet code to enable that setting when provisioning a new deployment. On the MediaWiki-Vagrant side this probably looks like a new setting in lib/mediawiki-vagrant/settings/definitions.rb and guarded code in Vagrantfile. On the ops/puppet.git side the change would be adding the desired setting to modules/vagrant/files/default-settings.yaml.

bd808 renamed this task from Problems launching a mediawiki-vagrant instance in Cloud VPS to Problems with NFS shares when launching a mediawiki-vagrant instance in Cloud VPS.Jul 14 2020, 6:01 PM

Change 612638 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[mediawiki/vagrant@master] [WIP] Allow forcing NFSv4 between host and vm

https://gerrit.wikimedia.org/r/612638

Mentioned in SAL (#wikimedia-cloud) [2020-07-14T18:15:06Z] <bd808> Building instance t257855.mediawiki-vagrant.eqiad1.wikimedia.cloud to test NFSv4 (T257855)

Change 612638 merged by jenkins-bot:
[mediawiki/vagrant@master] Allow forcing NFSv4 between host and vm

https://gerrit.wikimedia.org/r/612638

Change 612682 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] cloud: [mwv] Use NFSv4 by default LXC+Vagrant

https://gerrit.wikimedia.org/r/612682

I can get the mount to run successfully by making the following change to Vagrantfile. We probably need to keep NFSv3 support for non-Cloud users, so I don't consider this an appropriate patch for upstreaming, yet.

@awight, give rMWVA48725d50e989: Allow forcing NFSv4 between host and vm a try! After you pull that commit into your /srv/mediawiki-vagrant do vagrant config nfs_force_v4 yes; vagrant reload. This seems to be working well for me on a test server.

I have patch up for ops/puppet.git that will do the equivalent of vagrant config nfs_force_v4 yes for first time provisioning of the role::labs::mediawiki_vagrant role too. That patch won't fix things for existing role::labs::mediawiki_vagrant users however as it could clobber other vagrant config changes that have been made locally.

Change 612682 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloud: [mwv] Use NFSv4 by default LXC+Vagrant

https://gerrit.wikimedia.org/r/612682

bd808 claimed this task.

Hopefully fixed. Please do reopen if:

  • a fresh role::labs::mediawiki_vagrant deployment created after 2020-07-15 fails to mount NFS from the host
  • a legacy role::labs::mediawiki_vagrant deployment created before 2020-07-15 fails to mount NFS from the host after having manually done vagrant config nfs_force_v4 yes; vagrant reload
  • a fresh role::labs::mediawiki_vagrant deployment created after 2020-07-15

Blindingly fast work! I've recreated the instance and NFS is functional out of the box.