Page MenuHomePhabricator

Project entries are created in ldap but not the posix group entry (disallowing ssh, etc)
Closed, ResolvedPublic

Description

The mediawiki-vagrant labs project was created on 2015-12-09 as the new location to host a build server for MediaWiki-Vagrant USB installer images.

Since creating the project I have built 3 instances and had the same ssh access failures for all three (ssh -vvv trace below). Just to double check that this was a localized problem for this project I created a test instance in the wikimania-support project and verified that I was able to ssh to it once the initial puppet run had completed. The mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs has been left up to aid in debugging.

$ ssh -vvv mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs
OpenSSH_7.1p1, OpenSSL 1.0.2d 9 Jul 2015
debug1: Reading configuration data /Users/bd808/.ssh/config
debug1: /Users/bd808/.ssh/config line 38: Applying options for *.wmflabs
debug1: /Users/bd808/.ssh/config line 68: Applying options for *.wmflabs
debug3: kex names ok: [curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256]
debug1: /Users/bd808/.ssh/config line 73: Applying options for *
debug1: Reading configuration data /usr/local/etc/ssh/ssh_config
debug1: auto-mux: Trying existing master
debug1: Control socket "/Users/bd808/.ssh/sockets/e5ff9e9f3ccce5dbae3035602d262b2f38305903" does not exist
debug1: Executing proxy command: exec ssh -a -W mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs:22 bastion.wmflabs.org
debug1: permanently_drop_suid: 501
debug1: identity file /Users/bd808/.ssh/labs_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /Users/bd808/.ssh/labs_rsa-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.1
debug1: Remote protocol version 2.0, remote software version OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
debug1: match: OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3 pat OpenSSH_6.6.1* compat 0x04000000
debug2: fd 5 setting O_NONBLOCK
debug2: fd 4 setting O_NONBLOCK
debug1: Authenticating to mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs:22 as 'bd808'
debug3: hostkeys_foreach: reading file "/Users/bd808/.ssh/known_hosts"
debug3: record_hostkey: found key type ECDSA in file /Users/bd808/.ssh/known_hosts:223
debug3: load_hostkeys: loaded 1 keys from mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs
debug3: order_hostkeyalgs: prefer hostkeyalgs: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
debug2: kex_parse_kexinit: ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,ssh-rsa
debug2: kex_parse_kexinit: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: kex_parse_kexinit: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-md5-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: umac-64-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-256-etm@openssh.com,hmac-sha2-512-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1,hmac-md5-etm@openssh.com,hmac-ripemd160-etm@openssh.com,hmac-sha1-96-etm@openssh.com,hmac-md5-96-etm@openssh.com,hmac-md5,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib
debug2: kex_parse_kexinit: none,zlib@openssh.com,zlib
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ssh-ed25519
debug2: kex_parse_kexinit: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: kex_parse_kexinit: chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
debug2: kex_parse_kexinit: hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
debug2: kex_parse_kexinit: hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit: none,zlib@openssh.com
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug1: kex: server->client chacha20-poly1305@openssh.com <implicit> none
debug1: kex: client->server chacha20-poly1305@openssh.com <implicit> none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: Server host key: ecdsa-sha2-nistp256 SHA256:qdAWkPcVD/O/FdUil4dmWQKfgVYKSVzzvsaTpj4rSKQ
debug3: hostkeys_foreach: reading file "/Users/bd808/.ssh/known_hosts"
debug3: record_hostkey: found key type ECDSA in file /Users/bd808/.ssh/known_hosts:223
debug3: load_hostkeys: loaded 1 keys from mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs
debug1: Host 'mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs' is known and matches the ECDSA host key.
debug1: Found key in /Users/bd808/.ssh/known_hosts:223
debug2: set_newkeys: mode 1
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug2: set_newkeys: mode 0
debug1: SSH2_MSG_NEWKEYS received
debug1: Roaming not allowed by server
debug1: SSH2_MSG_SERVICE_REQUEST sent
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug2: key: /Users/bd808/.ssh/labs_rsa (0x7fd0416005f0), explicit
debug1: Authentications that can continue: publickey
debug3: start over, passed a different list publickey
debug3: preferred publickey
debug3: authmethod_lookup publickey
debug3: remaining preferred:
debug3: authmethod_is_enabled publickey
debug1: Next authentication method: publickey
debug1: Offering RSA public key: /Users/bd808/.ssh/labs_rsa
debug3: send_pubkey_test
debug2: we sent a publickey packet, wait for reply
debug1: Server accepts key: pkalg ssh-rsa blen 279
debug2: input_userauth_pk_ok: fp SHA256:QDCTAHctsnhkMb2PzvojA9Re/YsNw4kJVS4sFXMvSOo
debug3: sign_and_send_pubkey: RSA SHA256:QDCTAHctsnhkMb2PzvojA9Re/YsNw4kJVS4sFXMvSOo
Connection closed by UNKNOWN

Event Timeline

bd808 raised the priority of this task from to Needs Triage.
bd808 updated the task description. (Show Details)
bd808 added a project: Cloud-VPS.
bd808 subscribed.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald Transcript
root@mwv-image-builder:~# id bd808 | grep medi
uid=3518(bd808) gid=500(wikidev) groups=1003(wmf),50062(project-bastion),50076(project-openstack),50120(project-deployment-prep),50174(project-integration),50214(project-mobile),50230(project-otrs),50380(project-tools),50660(project-multimedia),50798(project-wikimania-support),50806(project-logstash),51822(project-mediawiki-core-team),52117(project-services),52147(project-mediahandler-tests),52354(project-wikidata-query),52447(project-grantreview),52579(project-mobile-smoketests),52607(project-reading-smoketest),52608(project-reading-web-staging),52623(project-sentry),52634(project-stashbot),52649(project-mediawiki-docker),52678(project-phlogiston),52755(project-commtech),51364(tools.bd808-test),52033(tools.jouncebot),52233(tools.convert),52325(tools.nagf),52391(tools.bash),52633(tools.stashbot),52645(tools.sal),52740(tools.axel),52774(tools.replag),500(wikidev)

It doesn't have project-mediawiki-vagrant there, so PAM rejects his ssh.

It was a new project created yesterday.

chasemp renamed this task from ssh access to instances in mediawiki-vagrant project failing with "Connection closed by UNKNOWN" response to new projects do not allow ssh as users do not get joined to the project in ldap.Dec 10 2015, 9:21 PM
chasemp triaged this task as High priority.
chasemp set Security to None.

I created a new project (dashiki) and I didn't get added to a project-dashiki group

The lizenzhinweisgenerator will probably have the same problem. I created it yesterday too (T120925).

@yuvipanda made a dashiki to verify the same behavior.

we see:

Dec 10 21:05:02 mwv-image-builder sshd[775]: fatal: Access denied for user bd808 by PAM account configuration [preauth]

this does get created for sudo but the ldap membership check fails:

-:ALL EXCEPT (project-mediawiki-vagrant) root:ALL

@bd808 is a cloudadmin so I imagine that is how he was able to create a VM in teh project in the first place?

chasemp renamed this task from new projects do not allow ssh as users do not get joined to the project in ldap to Groups for new projects are not created as groups in ldap (thus users cannot join them and ssh doesn't work, etc).Dec 10 2015, 9:41 PM

root@puppet-testing>getent group | grep -i Mediawiki-vagrant; echo $?
1

I looked at the source, it turns out that OSM first creates an empty group and then later on tries to add people to it. If empty groups are disallowed on openldap, OSM helpfully prints a debug message (!) and goes on with life. I suspect this is the problem. Look at https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/master/nova/OpenStackNovaProjectGroup.php#L251

Helpfully, it also says # TODO: If project group creation fails we need to be able to fail gracefully

I looked at the source, it turns out that OSM first creates an empty group and then later on tries to add people to it. If empty groups are disallowed on openldap, OSM helpfully prints a debug message (!) and goes on with life. I suspect this is the problem. Look at https://github.com/wikimedia/mediawiki-extensions-OpenStackManager/blob/master/nova/OpenStackNovaProjectGroup.php#L251

Helpfully, it also says # TODO: If project group creation fails we need to be able to fail gracefully

Right, that's a difference introduced by OpenLDAP. OpenDJ violated/ignored the LDAP schema used to store group memberships (we also had to amend/drop three groups for the import, I'm afraid this will beed to be fixed on the OSM side.

chasemp renamed this task from Groups for new projects are not created as groups in ldap (thus users cannot join them and ssh doesn't work, etc) to Groups for project are created in ldap but getent cannot see them on user VMs (disallowing ssh, etc).Dec 10 2015, 11:08 PM

With the idea that OSM was doing bad things I put up https://gerrit.wikimedia.org/r/#/c/258355/ and tested with a hotfix. It seemed to work but we saw the same issues on hosts. That led me to question the original premise.

In looking at ldap directly:

ldaplist -l projects mediawiki-vagrant

dn: cn=mediawiki-vagrant,ou=projects,dc=wikimedia,dc=org
objectClass: extensibleObject
objectClass: groupOfNames
info: use_volume=home
info: use_volume=project
info: servicegrouphomedirpattern=/home/%p%u/
member: uid=novaadmin,ou=people,dc=wikimedia,dc=org
member: uid=bd808,ou=people,dc=wikimedia,dc=org
member: uid=dduvall,ou=people,dc=wikimedia,dc=org
cn: mediawiki-vagrant

bd808's project seems there, and both test projects I tried are there as well (with and without the hotfix so I reverted it)

ldaplist -l projects gtest1-temp (with)
ldaplist -l projects gtest2-temp (without)

but we still see nothing with getent

puppet-testing>getent group | grep -i gtest; echo $?
1

Seems likely to be an acl issue with newly created projects in ldap

Andrew subscribed.

I feel like https://gerrit.wikimedia.org/r/#/c/258355/2 is an improvement regardless. But, yeah, if it's an ACL thing then this should go back to Moritz.

From what I can see, what happened in the case of mediawiki-vagrant is that:

  • the project ldap entry (cn=mediawiki-vagrant,ou=projects,dc=wikimedia,dc.org) was created; but
  • when it tried to create the group (under cn=project-mediawiki-vagrant,ou=groups,dc=wikimedia,dc=org) it failed (presumably. because of no members)

Since there is no failure recovery, that left the project sorta half-created in LDAP where the group does not exist so nobody can be made a member of it, but you can't create the group because the project is already there.

If we wanted to be really robust, what adding or removing a member of a project should do is:

  • grab the list of current users from the project entry;
  • update the list;
  • create or update the group entry with the entire list (so there is never a 'no member' group); then
  • update the project entry with the list (iff the group creation/update worked).

This makes sure the group entry is consistent with the project entry, fails better if something prevents the group from being created or updated, and retroactively fixes broken projects (when updating it).

If nobody else wants it, I can work in a patch for that.

coren renamed this task from Groups for project are created in ldap but getent cannot see them on user VMs (disallowing ssh, etc) to Project entries are created in ldap but not the posix group entry (disallowing ssh, etc).Dec 11 2015, 4:17 AM

Well that makes the getent behavior more clear thanks

So now I'm getting a bit confused. Adding the following record to ldap (manually, with novaadmin creds from silver) worked:

205 cn=project-gtest1-temp,ou=groups,dc=wikimedia,dc=org
objectClass: groupOfNames
objectClass: posixGroup
objectClass: top
member: uid=rush,ou=people,dc=wikimedia,dc=org
gidNumber: 52779
cn: project-gtest1-temp

This worked, is correct, and makes the nova project behave correctly.

This would be good news, as it matches our diagnosis, except that the patch above should already have done that and therefore should have worked.

Change 258488 had a related patch set uploaded (by coren):
OpenStackManager: remove obsolete isVirtual() handling

https://gerrit.wikimedia.org/r/258488

Change 258488 merged by coren:
OpenStackManager: remove obsolete isVirtual() handling

https://gerrit.wikimedia.org/r/258488

Change 258516 had a related patch set uploaded (by 20after4):
OpenStackManager: remove obsolete isVirtual() handling

https://gerrit.wikimedia.org/r/258516

Change 258516 merged by jenkins-bot:
OpenStackManager: remove obsolete isVirtual() handling

https://gerrit.wikimedia.org/r/258516

I deleted the mediawiki-vagrant project, recreated it and spun up an instance named mwv-image-builder.mediawiki-vagrant.eqiad.wmflabs. Ssh to the new instance works as expected. Thanks!

Confirmed fixed.