Page MenuHomePhabricator

sssd integration needs to be updated to include sudo config from LDAP support
Closed, ResolvedPublic

Description

After T218126: LDAP: try how sssd works with our servers we agreed on deploying sssd to more servers in Toolforge.

On 2019-04-17 I deployed sssd to the bastions and some users reported they were unable to become their tools.

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2019-04-17T12:08:53Z] <arturo> T221225 rebooting bastions to clean sssd. We are back to nscd/nslcd until we figure out what's wrong here

The working part of the become script is exec /usr/bin/sudo -niu "$prefix.$tool" "$@". $prefix is 'tools' in the Toolforge project. $tool is the name passed on the command line by the user (for example 'bd808-test').

With the nscd/nslcd implementation of LDAP into our nss stack this works as expected:

$ hostname -f
tools-sgebastion-08.tools.eqiad.wmflabs
$ echo $USER
bd808
$ sudo -niu tools.bd808-test
$ echo $USER
tools.bd808-test

With the sssd implementation, it fails:

$ hostname -f
tools-sgeexec-0901.tools.eqiad.wmflabs
$ echo $USER
bd808
$ sudo -niu tools.bd808-test
sudo: a password is required

This is a missing feature in our current sssd-sudo config. The sudoer rules that allow a Toolforge maintainer to become a given tool live in the LDAP directory. It looks like we currently have this explicitly disabled in the config:

/etc/sssd/sssd.conf
[sssd]
#debug_level=10
domains = wikimedia.org
default_domain_suffix = wikimedia.org
full_name_format = %1$s
services = nss, pam, ssh
config_file_version = 2

# (... cut ...)
[domain/wikimedia.org]
#debug_level=10
id_provider = ldap
auth_provider = ldap
# (... cut ...)
# disable stuff not provided by LDAP (value of id_provider is used by default)
sudo_provider = none
# (... cut ...)

I think we need the sssd equivalent of the nscd/nslcd config from /etc/sudo-ldap.conf added:

/etc/sudo-ldap.conf
BASE            dc=wikimedia,dc=org
URI             ldap://ldap-ro.eqiad.wikimedia.org:389 ldap://ldap-ro.eqiad.wikimedia.org:389
# The next settings are not honored by OpenLDAP but are honored by sudo-ldap and /etc/sudo-ldap.conf is a symlink to /etc/ldap/ldap.conf
BINDDN          cn=proxyagent,ou=profile,dc=wikimedia,dc=org
BINDPW          Eche0ieng8UaNoo
SUDOERS_BASE    ou=sudoers,cn=tools,ou=projects,dc=wikimedia,dc=org
SSL             start_tls
TLS_CHECKPEER   yes
TLS_REQCERT     demand
TLS_CACERTDIR   /etc/ssl/certs
TLS_CACERTFILE  /etc/ssl/certs/ca-certificates.crt
TLS_CACERT      /etc/ssl/certs/ca-certificates.crt

(The LDAP proxy user password here is public intentionally, don't freak out!)

bd808 renamed this task from Toolforge: deploying sssd to bastions to sssd integration needs to be updated to include sudo config from LDAP support.Apr 17 2019, 3:58 PM

Change 504817 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] ldap: Add support for sudo rules in sssd client config

https://gerrit.wikimedia.org/r/504817

Change 504817 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ldap: Add support for sudo rules in sssd client config

https://gerrit.wikimedia.org/r/504817

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:41:45Z] <arturo> T221225 disable puppet agent in the bastions

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:43:20Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: sssd in the puppet bastion prefix

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:49:22Z] <arturo> T221225 run puppet agent in the bastions and reboot them with sssd

It seems to work.

aborrero@tools-sgebastion-07:~$ sudo -niu tools.bd808-test
tools.bd808-test@tools-sgebastion-07:~$ logout

aborrero@tools-sgebastion-07:~$ become arturo-test-tool
tools.arturo-test-tool@tools-sgebastion-07:~$

Please reopen if there are any other issue.

Still problems:

aborrero@tools-sgebastion-07:~$ sudo su eugene233
eugene233@tools-sgebastion-07:~$ become isa 
sudo: a password is required

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:16:32Z] <arturo> T221225 disable puppet in tools-sgebastion-08 for sssd testing

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:27:50Z] <arturo> T221225 rebooting tools-sgebastion-07 to clean sssd confiuration

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:28:02Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: classic in the puppet bastion prefix

I'm currently live-hacking tools-sgebastion-08, debugging sssd I see some weird messages:

(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'tools.isa' matched without domain, user is tools.isa
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [wikimedia.org]
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [dp_get_account_info_handler] (0x0200): Got request for [0x2][BE_REQ_GROUP][idnumber=54010]
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'tools.isa' matched without domain, user is tools.isa
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [wikimedia.org]
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [client_recv] (0x0200): Client disconnected!
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [get_client_cred] (0x0080): The following failure is expected to happen in case SELinux is disabled:
SELINUX_getpeercon failed [92][Protocol not available].
Please, consider enabling SELinux in your system.
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [dp_get_account_info_handler] (0x0200): Got request for [0x2][BE_REQ_GROUP][idnumber=500]
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [sysdb_set_entry_attr] (0x0200): Entry [name=tools.isa@wikimedia.org,cn=groups,cn=wikimedia.org,cn=sysdb] has set [ts_cache] attrs.
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [sysdb_set_entry_attr] (0x0200): Entry [name=wikidev@wikimedia.org,cn=groups,cn=wikimedia.org,cn=sysdb] has set [ts_cache] attrs.
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
[...]

The Flat name requested but domain has noflat name set, falling back to domain name entry is repeated many many times.

What I'm doing for testing is:

root@tools-sgebastion-08-# sudo su eugene233

eugene233@tools-sgebastion-08:~$ become isa
sudo: a password is required

eugene233@tools-sgebastion-08:~$ sudo -niu tools.isa
sudo: a password is required

Change 505761 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] ldap: sssd: add missing bits for sudo

https://gerrit.wikimedia.org/r/505761

With the patch https://gerrit.wikimedia.org/r/505761 I can do this:

aborrero@tools-sgebastion-08:~$ sudo su eugene233

eugene233@tools-sgebastion-08:~$ sudo -niu tools.isa

tools.isa@tools-sgebastion-08:~$ logout

eugene233@tools-sgebastion-08:~$ become isa

tools.isa@tools-sgebastion-08:~$ logout

Change 505761 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ldap: sssd: add missing bits for sudo

https://gerrit.wikimedia.org/r/505761

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T12:57:37Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: sssd in the puppet bastion prefix, try again with sssd in the bastions, reboot them

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T13:06:00Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: classic in the puppet bastion prefix, again. Rollback again.

Well, I suspect the sudo vs sudo-ldap package is causing some confusion here. In the live hacking I was doing in tools-sgebastion-08 I installed sudo and deleted sudo-ldap.

More testing is required before sssd can be deployed to the bastions.

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:19:17Z] <arturo> T221225 creating tools-sgebastion-09 for testing sssd stuff

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:26:50Z] <arturo> T221225 rebooting tools-sgebastion-08 to cleanup sssd

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:19:17Z] <arturo> T221225 creating tools-sgebastion-09 for testing sssd stuff

The scheduler apparently picked one of the same cloudvirts for this as one of the current bastions. It triggered https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudcontrol1003&service=tools+project+instance+distribution. I ack'ed the alert and downtimed it for 2 days so you can have a chance to move it to another cloudvirt or finish testing and rm the instance.

[00:32]  <icinga-wm>	ACKNOWLEDGEMENT - tools project instance distribution on cloudcontrol1003 is CRITICAL: CRITICAL: sgebastion class instances not spread out enough BryanDavis New instance added for testing, needs to be placed on a different cloudvirt manually. - The acknowledgement expires at: 2019-04-26 00:31:10. https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting

Mentioned in SAL (#wikimedia-cloud) [2019-04-24T09:18:54Z] <arturo> T221225 reallocating tools-sgebastion-09 to cloudvirt1008

Mentioned in SAL (#wikimedia-cloud) [2019-04-25T12:49:32Z] <arturo> T221225 using profile::ldap::client::labs::client_stack: sssd in horizon for tools-sgebastion-09 (testing)

Change 506435 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] sssd: sudo: don't install sudo-ldap if using sssd

https://gerrit.wikimedia.org/r/506435

Change 506614 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] standard: refactor into a profile

https://gerrit.wikimedia.org/r/506614

Change 506682 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] standard refactor: remove standard class from base classes

https://gerrit.wikimedia.org/r/506682

aborrero triaged this task as Medium priority.Apr 26 2019, 4:02 PM
aborrero moved this task from Triage to In Progress on the Toolforge board.

Change 506682 merged by Jbond:
[operations/puppet@production] standard refactor: remove standard class from base classes

https://gerrit.wikimedia.org/r/506682

Change 506614 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] standard: introduce a wrapper profile and use it in CloudVPS

https://gerrit.wikimedia.org/r/506614

Mentioned in SAL (#wikimedia-cloud) [2019-04-29T10:20:58Z] <arturo> disable puppet in all servers to livehack tools-puppetmaster-01 to test T221225

Mentioned in SAL (#wikimedia-cloud) [2019-04-29T10:27:52Z] <arturo> T221225 reboot tool-sgebastion-09 for testing sssd

Change 506979 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: introduce proper base role/profile for VM instances

https://gerrit.wikimedia.org/r/506979

Mentioned in SAL (#wikimedia-cloud) [2019-04-29T11:22:16Z] <arturo> T221225 re-enable puppet agent in all toolforge servers

Change 507005 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] vagrant: refactor roles into profiles

https://gerrit.wikimedia.org/r/507005

My plans are:

  • decouple sudo from sudo-ldap at module level, i.e, at modules/sudo/manifests/ level. So we have ::sudo and ::sudo::sudo-ldap.
  • introduce support in sudo::user and sudo::group for this new decoupling. They are now aware of the different flavors, defaulting to ::sudo (prod preferred).
  • introduce a global hiera key sudo_flavor which can be used in profiles to select which sudo flavor we need. By default, in prod, ::sudo is preferred, in labs ::sudo::sudo-ldap is preferred.
  • by default, all VMs use sudo-ldap flavor (behaviour untouched) until we switch to sssd, in which case we start using the sudo flavor instead.
  • a VM using sssd will need 2 hiera keys:
profile::ldap::client::labs::client_stack: sssd
sudo_flavor: sudo
  • with time, when enough VMs are using this config, we can simply switch the defaults, and probably simplify a bit the puppet code

I needed to refactor some callers of sudo::user and sudo::group to be able to introduce hiera keys, that's why the additional patches.

Relevant patches, in order, are (skipping some already merged ones):

Please @bd808 @Andrew @Bstorm @MoritzMuehlenhoff @jbond review if this plan makes sense.

Change 507005 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] vagrant: refactor roles into profiles

https://gerrit.wikimedia.org/r/507005

Change 506979 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: introduce proper base role/profile for VM instances

https://gerrit.wikimedia.org/r/506979

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T10:56:32Z] <arturo> T221225 create tools-sgebastion-0test for more sssd tests

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T11:07:19Z] <arturo> T221225 disable puppet in toolforge

Change 506435 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/506435

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T12:45:11Z] <arturo> adding sudo_flavor: sudo hiera config to all puppet prefixes with sssd (T221225)

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T12:50:43Z] <arturo> enable puppet in all servers T221225

Change 507317 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "sudo: decouple sudo from sudo-ldap"

https://gerrit.wikimedia.org/r/507317

Change 507317 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "sudo: decouple sudo from sudo-ldap"

https://gerrit.wikimedia.org/r/507317

Change 507376 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "Revert "sudo: decouple sudo from sudo-ldap""

https://gerrit.wikimedia.org/r/507376

Mentioned in SAL (#wikimedia-cloud) [2019-05-06T10:53:54Z] <arturo> T221225 disable puppet in all toolforge servers for testing sssd patch (puppetmaster livehack)

Change 507376 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/507376

Change 508311 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/508311

Change 508311 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/508311

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:16:36Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-sgebastion prefix, and reboot tools-sgebastion-07/08

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:27:06Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-docker-registry prefix, and reboot tools-docker-registry-[03-04]

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:31:31Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-checker prefix, and reboot tools-checker-03

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:39:50Z] <arturo> T221225 disable puppet in all tools-worker nodes in preparation for sssd

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T13:04:48Z] <arturo> T221225 depool and rebooted tools-worker-1001 in preparation for sssd migration

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:07:59Z] <arturo> T221225 switch to sssd/sudo in puppet prefix for tools-worker

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:09:51Z] <arturo> T221225 depool & switch to sssd/sudo & reboot & repool tools-worker-1001

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:12:55Z] <arturo> T221225 depool & switch to sssd/sudo & reboot & repool tools-worker-1002

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:26:55Z] <arturo> T221225 hard reboot tools-worker-1001

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:35:12Z] <arturo> T221225 hard reboot tools-worker-1001 again

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:44:21Z] <arturo> T221225 back to classic/ldap hiera config in the tools-worker puppet prefix

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:49:12Z] <arturo> T221225 repool tools-worker-1002 (using nscd/nslcd and sudoldap)

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T18:13:20Z] <arturo> T221225 created tools-worker-1029 to test sssd/sudo stuff

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T18:15:06Z] <arturo> T221225 for the record, tools-worker-1001 is not working after trying with sssd

Mentioned in SAL (#wikimedia-cloud) [2019-05-29T10:13:36Z] <arturo> enroll the tools-worker-1029 VM into toolforge k8s, but leave it cordoned for sssd testing purposes (T221225)

Well. Closing this task now, since we do have sudo support now. Will open other tasks for the other issues we have.