Page MenuHomePhabricator

Investigate and implement alternative for showmount based check at instance boot time
Closed, ResolvedPublic

Description

Currently, during the first puppet run post instance creation, we run the block-for-export script through puppet that blocks until the NFS volumes that should be mounted are available. Block-for-export uses showmount to determine this, which in turn - for some reason, contacts rpcbind on the NFS server machine to determine the port for the mount daemon which it then queries to figure out the state of the NFS server. rpcbind runs on port 111, and mountd is pinned to port 38466 through the RPCMOUNTDOPTS config. NFS itself runs on port 2049.

We have to block logins until the mount is available before the instance starts allowing for ssh login because otherwise, there could be a temporary state in which /home isn't mounted yet, a user logs in, /home gets created, and then something whacky happens and the directory is overridden with the NFS mount. The block-for-export script runs with a 180s timeout, so even though we don't notice this wackiness often, it can happen currently too if the mount isn't available before the timeout.

The only reason we have to use rpcbind, or allow instances to talk to rpcbind or rpc.mountd is this showmount check, which seems to be stuck in the NFSv2/v3 world. If we figured out an alternative way to block user logins before the NFS mounts are available, we could get rid of this complexity. Also in the world where we are pushing labstore boxes into the public vlan, it would be good to only expose port 2049 to the instances.

The current block mechanism also doesn't allow for debugging by admins. The alternative I'm considering right now is using a mechanism like dropping a nologin file that blocks non-root logins until it is removed through some notification mechanism after the mount succeeds, which would allow for roots to easily log in to the instance to debug things too.

Event Timeline

Change 368223 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] firstboot: Prevent non-root logins while NFS mounts aren't available

https://gerrit.wikimedia.org/r/368223

Change 368223 merged by Madhuvishy:
[operations/puppet@production] firstboot: Prevent non-root logins while NFS mounts aren't available

https://gerrit.wikimedia.org/r/368223

Change 374845 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] nfsmount: Add temporary exception to the block-for-export check

https://gerrit.wikimedia.org/r/374845

Change 374845 merged by Madhuvishy:
[operations/puppet@production] nfsmount: Add temporary exception to the block-for-export check

https://gerrit.wikimedia.org/r/374845

Change 378639 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] firstboot: Force puppet run after ensure NFS mounts available

https://gerrit.wikimedia.org/r/378639

Change 378639 merged by Madhuvishy:
[operations/puppet@production] firstboot: Force puppet run after ensure NFS mounts available

https://gerrit.wikimedia.org/r/378639

Change 378653 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] firstboot: Fix force puppet run after ensure NFS mounts available

https://gerrit.wikimedia.org/r/378653

Change 378653 merged by Madhuvishy:
[operations/puppet@production] firstboot: Fix force puppet run after ensure NFS mounts available

https://gerrit.wikimedia.org/r/378653

I was able to change the firstboot script, that runs when a new instance is created and being booted for the first time, to create an /etc/nologin file at the beginning of it's run, and deletes the file at the end, after ensuring that NFS is mounted. I tested this by building test images for Trusty and Jessie.

The expected behavior for /etc/nologin is if the file is present, non-root users cannot log in. The PAM config that relates to nologin in our setup is:

auth    required     pam_nologin.so

The Trusty image works perfectly as expected.

However, with Jessie, the behavior is erratic:

Here's an example for correct behaviour:

From auth.log on a new Jessie instance with the hack in place, where the right behavior happens:

Sep 20 15:49:43 manual-fullstack-1505922384 sshd[12937]: Failed publickey for madhuvishy from 10.68.18.66 port 53742 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 15:49:43 manual-fullstack-1505922384 sshd[12937]: fatal: Access denied for user madhuvishy by PAM account configuration [preauth]
Sep 20 15:49:54 manual-fullstack-1505922384 sshd[13506]: Postponed publickey for madhuvishy from 10.68.18.66 port 53754 ssh2 [preauth]
Sep 20 15:49:54 manual-fullstack-1505922384 sshd[13506]: Failed publickey for madhuvishy from 10.68.18.66 port 53754 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 15:49:54 manual-fullstack-1505922384 sshd[13506]: fatal: Access denied for user madhuvishy by PAM account configuration [preauth]
Sep 20 15:50:03 manual-fullstack-1505922384 sshd[13808]: Postponed publickey for madhuvishy from 10.68.18.66 port 53764 ssh2 [preauth]
Sep 20 15:50:03 manual-fullstack-1505922384 sshd[13808]: Accepted publickey for madhuvishy from 10.68.18.66 port 53764 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 15:50:03 manual-fullstack-1505922384 sshd[13808]: pam_unix(sshd:session): session opened for user madhuvishy by (uid=0)
Sep 20 15:50:04 manual-fullstack-1505922384 sshd[13824]: Starting session: shell on pts/1 for madhuvishy from 10.68.18.66 port 53764

And from syslog, we see that the user auth succeeded after the nologin file deletion happens

root@manual-fullstack-1505922384:~# cat /var/log/syslog | grep nologin
Sep 20 15:49:59 manual-fullstack-1505922384 rc.local[410]: + rm /etc/nologin

Here's another:

Sep 20 16:23:00 manual-fullstack-1505924414 sshd[11773]: Failed publickey for madhuvishy from 10.68.18.66 port 41610 ssh2: RSA 13:9d:2d:94:cf:44:dd:23:b3:32:86:d5:b7:d2:85:f8
Sep 20 16:23:02 manual-fullstack-1505924414 sshd[12323]: Postponed publickey for madhuvishy from 10.68.18.66 port 41616 ssh2 [preauth]
Sep 20 16:23:03 manual-fullstack-1505924414 sshd[12323]: Accepted publickey for madhuvishy from 10.68.18.66 port 41616 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 16:23:03 manual-fullstack-1505924414 sshd[12323]: pam_unix(sshd:session): session opened for user madhuvishy by (uid=0)
Sep 20 16:23:03 manual-fullstack-1505924414 sshd[12410]: Starting session: shell on pts/0 for madhuvishy from 10.68.18.66 port 41616

And from syslog:

Sep 20 16:23:45 manual-fullstack-1505924414 rc.local[408]: + rm /etc/nologin

Here the session is opened for the user even before the nologin file is deleted.

I'm not sure what the reason for this unpredictable behavior of the PAM nologin stuff. From the internets, one advice (e.g. link -
https://www.novell.com/support/kb/doc.php?id=7007116) seems to be to use account required pam_nologin.so instead of auth required pam_nologin.so, but I don't understand how the two things differ, and why this works for Trusty.

Also, after the instance is booted, if I touch an /etc/nologin file, it works perfectly whether it's Jessie or Trusty, that tells me may be there's some race condition in the Jessie boot up where the PAM related services come up later than the user login attempt? Not sure.

The difference between "account" and "auth" is that the auth step ensures that you've successfully authenticated (by means of a password or a pubkey e.g.), while account is a separate administrative step to control the user account, e.g. a PAM module which denies login to a service while it's in maintenance. Another typical use case of "account" is to force a user to change the password.

I have no good idea what's wrong here, PAM modules are loaded by the respective services, not sure where/how this triggers a race.

You can generate a chart of the startup with "systemd-analyze plot >foo.svg", maybe there's an obvious difference in ordering of started services in the case of a successful and failing nologin case?

Thanks @MoritzMuehlenhoff, I tried that and came up with these two plots, don't really see much of a difference as far as auth related services go.

From a VM with correct expected behavior:

From a VM with incorrect behaviour:

Change 384574 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] ssh-key-ldap-lookup: Modify script to run additional checks before ssh login

https://gerrit.wikimedia.org/r/384574

I've decided to look at alternatives because the /etc/nologin mechanism seems to be flaky.

I'm thinking something like this should work - have ssh-key-ldap-lookup fail if mounts aren't available while the firstboot stuff is in flight (https://gerrit.wikimedia.org/r/#/c/384574), and leave the ensure all NFS stuff is mounted block in place in firstboot.sh (https://github.com/wikimedia/puppet/blob/production/modules/labs_vmbuilder/files/firstboot.sh#L119). I'll remove the nologin stuff from there, and also kill the block-for-export stuff in nfs_mount.pp. This should ensure that if someone tries to login while firstboot is in flight their log in fails, but we don't run mount -a every time they try to ssh in after

I've decided to look at alternatives because the /etc/nologin mechanism seems to be flaky.

Adding some more explanation for this:

Quoting from my comment above on the behavior of /etc/nologin on Ubuntu trusty vs Debian Jessie instances

I was able to change the firstboot script, that runs when a new instance is created and being booted for the first time, to create an /etc/nologin file at the beginning of it's run, and deletes the file at the end, after ensuring that NFS is mounted. I tested this by building test images for Trusty and Jessie.

The expected behavior for /etc/nologin is if the file is present, non-root users cannot log in. The PAM config that relates to nologin in our setup is:

auth    required     pam_nologin.so

The Trusty image works perfectly as expected.

However, with Jessie, the behavior is erratic:

Here's an example for correct behaviour:

From auth.log on a new Jessie instance with the hack in place, where the right behavior happens:

Sep 20 15:49:43 manual-fullstack-1505922384 sshd[12937]: Failed publickey for madhuvishy from 10.68.18.66 port 53742 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 15:49:43 manual-fullstack-1505922384 sshd[12937]: fatal: Access denied for user madhuvishy by PAM account configuration [preauth]
Sep 20 15:49:54 manual-fullstack-1505922384 sshd[13506]: Postponed publickey for madhuvishy from 10.68.18.66 port 53754 ssh2 [preauth]
Sep 20 15:49:54 manual-fullstack-1505922384 sshd[13506]: Failed publickey for madhuvishy from 10.68.18.66 port 53754 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 15:49:54 manual-fullstack-1505922384 sshd[13506]: fatal: Access denied for user madhuvishy by PAM account configuration [preauth]
Sep 20 15:50:03 manual-fullstack-1505922384 sshd[13808]: Postponed publickey for madhuvishy from 10.68.18.66 port 53764 ssh2 [preauth]
Sep 20 15:50:03 manual-fullstack-1505922384 sshd[13808]: Accepted publickey for madhuvishy from 10.68.18.66 port 53764 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 15:50:03 manual-fullstack-1505922384 sshd[13808]: pam_unix(sshd:session): session opened for user madhuvishy by (uid=0)
Sep 20 15:50:04 manual-fullstack-1505922384 sshd[13824]: Starting session: shell on pts/1 for madhuvishy from 10.68.18.66 port 53764

And from syslog, we see that the user auth succeeded after the nologin file deletion happens

root@manual-fullstack-1505922384:~# cat /var/log/syslog | grep nologin
Sep 20 15:49:59 manual-fullstack-1505922384 rc.local[410]: + rm /etc/nologin

Here's another:

Sep 20 16:23:00 manual-fullstack-1505924414 sshd[11773]: Failed publickey for madhuvishy from 10.68.18.66 port 41610 ssh2: RSA 13:9d:2d:94:cf:44:dd:23:b3:32:86:d5:b7:d2:85:f8
Sep 20 16:23:02 manual-fullstack-1505924414 sshd[12323]: Postponed publickey for madhuvishy from 10.68.18.66 port 41616 ssh2 [preauth]
Sep 20 16:23:03 manual-fullstack-1505924414 sshd[12323]: Accepted publickey for madhuvishy from 10.68.18.66 port 41616 ssh2: RSA ea:fa:b8:9a:5f:7b:51:2b:64:70:7c:dc:8c:79:ba:70
Sep 20 16:23:03 manual-fullstack-1505924414 sshd[12323]: pam_unix(sshd:session): session opened for user madhuvishy by (uid=0)
Sep 20 16:23:03 manual-fullstack-1505924414 sshd[12410]: Starting session: shell on pts/0 for madhuvishy from 10.68.18.66 port 41616

And from syslog:

Sep 20 16:23:45 manual-fullstack-1505924414 rc.local[408]: + rm /etc/nologin

Here the session is opened for the user even before the nologin file is deleted.

Following this here are the things I investigated:

  • Plotting the systemd services startup and comparing them between Jessie instances where the nologin mechanism worked as intended and didn't (Plots uploaded on comment above) - This was inconclusive since the plots look pretty much the same
  • Tested internet advice to use the auth module instead of account in the ssh config (by changing in manually in a test instance, and trying to login when the nologin file is present as non-root user) - This did not even work, if I used auth instead of account, I was able to log in every attempt as a non-root user.
  • Looked to see if there were any changes happening in our ssh config during different puppet runs, didn't find anything

From all this, it just seems to me that there are some cases where the nologin check passes even when the file is present, and from the investigation I've done so far, I can't figure out why. The fact that this behavior occurs in a small percentage of times, only in Jessie, may mean that even if we figured out what OS specific issue may be going on here, this could happen again when we use a different operating system. This all points to flakiness, and I'm considering using some similar logic in an alternate place like the ssh-key-ldap-lookup script, that we know works reliably, only applies to cloud VPS instances, and is a very simple change - https://gerrit.wikimedia.org/r/#/c/384574

The pam_nologin behavior you're reporting sounds very odd indeed. If it's actually the case it will be CVE-worthy! It's an old, popular and well-audited piece of code though, so it'd be surprising to me if the root cause lies with pam_nologin and not somewhere in our configuration. It's not impossible of course, bugs and CVEs do happen :)

Have you encountered this behavior only during early/first boot, or is this reproducible after the first boot when e.g. creating that file? Is this perhaps a race that occurs while puppet is running and changing the system's configuration? Maybe something as innocent as sshd's UsePAM setting, or another PAM configuration, given that we're messing with it in the first puppet run to add LDAP auth?

My comments on adding authz bits to the LDAP lookup script stand -- I feel strongly that it's the wrong place to be adding this kind of check.

Going one step further to the original assumptions:

there could be a temporary state in which /home isn't mounted yet, a user logs in, /home gets created, and then something whacky happens and the directory is overridden with the NFS mount

Why is the user's /home created? We seem to have pam_mkhomedir configured, but if that's not the intended behavior, why do we provision that? Perhaps the solution is to just to remove that, not create the /home directory automatically and let logins fail due to the lack of a home directory?

More generally, perhaps this policy is something that should be addressed in/with PAM in the first place, rather than creating and deleting files on the filesystem to serve as synchronization mutexes between the various steps of instance provisioning?

I think pam_mkhomedir may be there because the same logic paths are used whether an instance has /home on NFS or not.

I think pam_mkhomedir may be there because the same logic paths are used whether an instance has /home on NFS or not.

I'm pretty sure this is correct. Almost all Cloud VPS projects have moved off of NFS homedirs so it is common to need to bootstrap $HOME from /etc/skel on initial login to a new instance.

The pam_nologin behavior you're reporting sounds very odd indeed. If it's actually the case it will be CVE-worthy! It's an old, popular and well-audited piece of code though, so it'd be surprising to me if the root cause lies with pam_nologin and not somewhere in our configuration. It's not impossible of course, bugs and CVEs do happen :)

Have you encountered this behavior only during early/first boot, or is this reproducible after the first boot when e.g. creating that file? Is this perhaps a race that occurs while puppet is running and changing the system's configuration? Maybe something as innocent as sshd's UsePAM setting, or another PAM configuration, given that we're messing with it in the first puppet run to add LDAP auth?

When the config is account required nologin.so I've only been able to reproduce this behavior during the firstboot stage. I've tried applying auth required nologin.so post boot to see how the behavior changes, and been able to log in every time, despite that config existing.

As far as race conditions, that was my original thought, I poked at the systemd service startup order, but I haven't debugged deeply enough to identify something that puppet may be changing sometimes.

Going one step further to the original assumptions:

there could be a temporary state in which /home isn't mounted yet, a user logs in, /home gets created, and then something whacky happens and the directory is overridden with the NFS mount

Why is the user's /home created? We seem to have pam_mkhomedir configured, but if that's not the intended behavior, why do we provision that? Perhaps the solution is to just to remove that, not create the /home directory automatically and let logins fail due to the lack of a home directory?

More generally, perhaps this policy is something that should be addressed in/with PAM in the first place, rather than creating and deleting files on the filesystem to serve as synchronization mutexes between the various steps of instance provisioning?

This has been answered in the comments above by Chase and Bryan.

Change 391619 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] firstboot: Prevent non-root users from logging in during instance set up

https://gerrit.wikimedia.org/r/391619

Change 391619 merged by Madhuvishy:
[operations/puppet@production] firstboot: Prevent non-root users from logging in during instance set up

https://gerrit.wikimedia.org/r/391619

Change 384574 merged by Madhuvishy:
[operations/puppet@production] ssh-key-ldap-lookup: Deny user auth if /etc/block-ldap-key-lookup exists

https://gerrit.wikimedia.org/r/384574

Change 391970 had a related patch set uploaded (by Madhuvishy; owner: Madhuvishy):
[operations/puppet@production] nfs_mount: Remove showmount based blocking check for nfs shares

https://gerrit.wikimedia.org/r/391970

Change 391970 merged by Madhuvishy:
[operations/puppet@production] nfs_mount: Remove showmount based blocking check for nfs shares

https://gerrit.wikimedia.org/r/391970

Raw notes from Etherpad in rolling this all out:

  1. [DONE] Make new test images for ubuntu and debian after merging https://gerrit.wikimedia.org/r/#/c/391619/2
Images built:
root@labcontrol1001:~# glance image-list | grep testing
| af63165e-cf13-404e-bffe-ed7a86e97594 | debian-8.9-jessie (testing)                          |
| a2269f6d-a125-4e66-a7a7-ebb9dc701dd1 | ubuntu-14.04-trusty (testing)                        |
| e27bab24-4a17-47e5-bb89-b0c7526dea20 | debian-9.2-stretch (testing)                           |

[Tested all images multiple times, they work as expected, they create the block-ldap-key-lookup file (but no user auth based on it yet)]

  1. [DONE] Disable puppet across WMCS -
From labpuppetmaster1001
sudo cumin -p 95 -b 10 -d --backend openstack "A:all and not O{project:testlabs}" "sudo puppet agent --disable 'Testing Deny user auth if /etc/block-ldap-key-lookup exists patch'"
# Won't apply to admin-monitoring and contintcloud with A:all
  1. [DONE] Merge https://gerrit.wikimedia.org/r/#/c/384574/
  1. Run the tests for the new images by creating a bunch of new images in testlabs

(I have scripts in local to do this)

Built 5 images each for Jessie and Trusty - They all passed my tests

Spot checking by applying puppet change
tools-exec-1412
tools-bastion-05
tools-worker-1007
(all good)

  1. If that + puppet change testing on testlabs goes fine - rollout puppet change to elsewhere

Rolled out fully to tools
Rolling out currently to every where else

  1. publish new images to prod
Current images

| 0cc9f2fa-464f-40cd-98b5-5501290b62bf | ubuntu-14.04-trusty                                  |
| 4a2aaa92-acf2-4837-ab41-74a21117b021 | debian-8.9-jessie                                    |
| e02770ae-b45f-4776-a852-d9a13217611e  | debian-9.1-stretch                                  |

New images

| a2269f6d-a125-4e66-a7a7-ebb9dc701dd1 | ubuntu-14.04-trusty (testing)                        |
| af63165e-cf13-404e-bffe-ed7a86e97594 | debian-8.9-jessie (testing)                          |
| e27bab24-4a17-47e5-bb89-b0c7526dea20 | debian-9.2-stretch (testing)                           |


# Ubuntu trusty
glance image-update --name "ubuntu-14.04-trusty (deprecated 2017-11-16)" 0cc9f2fa-464f-40cd-98b5-5501290b62bf 
glance image-update --name "ubuntu-14.04-trusty" a2269f6d-a125-4e66-a7a7-ebb9dc701dd1 --property show=true
openstack token issue
curl -X POST -H 'X-Auth-Token: <token id>' http://labcontrol1001.wikimedia.org:9292/v2/images/0cc9f2fa-464f-40cd-98b5-5501290b62bf/actions/deactivate


# Debian Jessie
glance image-update --name "debian-8.9-jessie (deprecated  2017-11-16)" 4a2aaa92-acf2-4837-ab41-74a21117b021
glance image-update --name "debian-8.9-jessie"  af63165e-cf13-404e-bffe-ed7a86e97594 --property show=true
openstack token issue
curl -X POST -H 'X-Auth-Token: <token id>' http://labcontrol1001.wikimedia.org:9292/v2/images/4a2aaa92-acf2-4837-ab41-74a21117b021/actions/deactivate

# Debian Stretch
glance image-update --name "debian-9.1-stretch (deprecated  2017-11-16)" e02770ae-b45f-4776-a852-d9a13217611e
glance image-update --name "debian-9.2-stretch"  e27bab24-4a17-47e5-bb89-b0c7526dea20 --property show=true
openstack token issue
curl -X POST -H 'X-Auth-Token: <token id>' http://labcontrol1001.wikimedia.org:9292/v2/images/e02770ae-b45f-4776-a852-d9a13217611e/actions/deactivate
  1. [DONE] Revert: https://gerrit.wikimedia.org/r/#/c/374845/
  1. [DONE] Remove this whole blocking check - https://github.com/wikimedia/puppet/blob/production/modules/labstore/manifests/nfs_mount.pp#L139

https://gerrit.wikimedia.org/r/#/c/391970