Page MenuHomePhabricator

sssd integration needs to be updated to include sudo config from LDAP support
Closed, ResolvedPublic

Description

After T218126: LDAP: try how sssd works with our servers we agreed on deploying sssd to more servers in Toolforge.

On 2019-04-17 I deployed sssd to the bastions and some users reported they were unable to become their tools.

Details

Related Gerrit Patches:

Event Timeline

Mentioned in SAL (#wikimedia-cloud) [2019-04-17T12:08:53Z] <arturo> T221225 rebooting bastions to clean sssd. We are back to nscd/nslcd until we figure out what's wrong here

The working part of the become script is exec /usr/bin/sudo -niu "$prefix.$tool" "$@". $prefix is 'tools' in the Toolforge project. $tool is the name passed on the command line by the user (for example 'bd808-test').

With the nscd/nslcd implementation of LDAP into our nss stack this works as expected:

$ hostname -f
tools-sgebastion-08.tools.eqiad.wmflabs
$ echo $USER
bd808
$ sudo -niu tools.bd808-test
$ echo $USER
tools.bd808-test

With the sssd implementation, it fails:

$ hostname -f
tools-sgeexec-0901.tools.eqiad.wmflabs
$ echo $USER
bd808
$ sudo -niu tools.bd808-test
sudo: a password is required

This is a missing feature in our current sssd-sudo config. The sudoer rules that allow a Toolforge maintainer to become a given tool live in the LDAP directory. It looks like we currently have this explicitly disabled in the config:

/etc/sssd/sssd.conf
[sssd]
#debug_level=10
domains = wikimedia.org
default_domain_suffix = wikimedia.org
full_name_format = %1$s
services = nss, pam, ssh
config_file_version = 2

# (... cut ...)
[domain/wikimedia.org]
#debug_level=10
id_provider = ldap
auth_provider = ldap
# (... cut ...)
# disable stuff not provided by LDAP (value of id_provider is used by default)
sudo_provider = none
# (... cut ...)

I think we need the sssd equivalent of the nscd/nslcd config from /etc/sudo-ldap.conf added:

/etc/sudo-ldap.conf
BASE            dc=wikimedia,dc=org
URI             ldap://ldap-ro.eqiad.wikimedia.org:389 ldap://ldap-ro.eqiad.wikimedia.org:389
# The next settings are not honored by OpenLDAP but are honored by sudo-ldap and /etc/sudo-ldap.conf is a symlink to /etc/ldap/ldap.conf
BINDDN          cn=proxyagent,ou=profile,dc=wikimedia,dc=org
BINDPW          Eche0ieng8UaNoo
SUDOERS_BASE    ou=sudoers,cn=tools,ou=projects,dc=wikimedia,dc=org
SSL             start_tls
TLS_CHECKPEER   yes
TLS_REQCERT     demand
TLS_CACERTDIR   /etc/ssl/certs
TLS_CACERTFILE  /etc/ssl/certs/ca-certificates.crt
TLS_CACERT      /etc/ssl/certs/ca-certificates.crt

(The LDAP proxy user password here is public intentionally, don't freak out!)

bd808 renamed this task from Toolforge: deploying sssd to bastions to sssd integration needs to be updated to include sudo config from LDAP support.Apr 17 2019, 3:58 PM

Change 504817 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] ldap: Add support for sudo rules in sssd client config

https://gerrit.wikimedia.org/r/504817

Change 504817 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ldap: Add support for sudo rules in sssd client config

https://gerrit.wikimedia.org/r/504817

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:41:45Z] <arturo> T221225 disable puppet agent in the bastions

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:43:20Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: sssd in the puppet bastion prefix

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:49:22Z] <arturo> T221225 run puppet agent in the bastions and reboot them with sssd

aborrero closed this task as Resolved.Apr 23 2019, 9:53 AM

It seems to work.

aborrero@tools-sgebastion-07:~$ sudo -niu tools.bd808-test
tools.bd808-test@tools-sgebastion-07:~$ logout

aborrero@tools-sgebastion-07:~$ become arturo-test-tool
tools.arturo-test-tool@tools-sgebastion-07:~$

Please reopen if there are any other issue.

aborrero reopened this task as Open.Apr 23 2019, 10:15 AM

Still problems:

aborrero@tools-sgebastion-07:~$ sudo su eugene233
eugene233@tools-sgebastion-07:~$ become isa 
sudo: a password is required

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:16:32Z] <arturo> T221225 disable puppet in tools-sgebastion-08 for sssd testing

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:27:50Z] <arturo> T221225 rebooting tools-sgebastion-07 to clean sssd confiuration

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:28:02Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: classic in the puppet bastion prefix

I'm currently live-hacking tools-sgebastion-08, debugging sssd I see some weird messages:

(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'tools.isa' matched without domain, user is tools.isa
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [wikimedia.org]
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [dp_get_account_info_handler] (0x0200): Got request for [0x2][BE_REQ_GROUP][idnumber=54010]
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'tools.isa' matched without domain, user is tools.isa
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [wikimedia.org]
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [client_recv] (0x0200): Client disconnected!
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [get_client_cred] (0x0080): The following failure is expected to happen in case SELinux is disabled:
SELINUX_getpeercon failed [92][Protocol not available].
Please, consider enabling SELinux in your system.
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1].
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1].
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [dp_get_account_info_handler] (0x0200): Got request for [0x2][BE_REQ_GROUP][idnumber=500]
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [sysdb_set_entry_attr] (0x0200): Entry [name=tools.isa@wikimedia.org,cn=groups,cn=wikimedia.org,cn=sysdb] has set [ts_cache] attrs.
(Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [sysdb_set_entry_attr] (0x0200): Entry [name=wikidev@wikimedia.org,cn=groups,cn=wikimedia.org,cn=sysdb] has set [ts_cache] attrs.
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name
[...]

The Flat name requested but domain has noflat name set, falling back to domain name entry is repeated many many times.

What I'm doing for testing is:

root@tools-sgebastion-08-# sudo su eugene233

eugene233@tools-sgebastion-08:~$ become isa
sudo: a password is required

eugene233@tools-sgebastion-08:~$ sudo -niu tools.isa
sudo: a password is required

Change 505761 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] ldap: sssd: add missing bits for sudo

https://gerrit.wikimedia.org/r/505761

With the patch https://gerrit.wikimedia.org/r/505761 I can do this:

aborrero@tools-sgebastion-08:~$ sudo su eugene233

eugene233@tools-sgebastion-08:~$ sudo -niu tools.isa

tools.isa@tools-sgebastion-08:~$ logout

eugene233@tools-sgebastion-08:~$ become isa

tools.isa@tools-sgebastion-08:~$ logout

Change 505761 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ldap: sssd: add missing bits for sudo

https://gerrit.wikimedia.org/r/505761

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T12:57:37Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: sssd in the puppet bastion prefix, try again with sssd in the bastions, reboot them

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T13:06:00Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: classic in the puppet bastion prefix, again. Rollback again.

Well, I suspect the sudo vs sudo-ldap package is causing some confusion here. In the live hacking I was doing in tools-sgebastion-08 I installed sudo and deleted sudo-ldap.

More testing is required before sssd can be deployed to the bastions.

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:19:17Z] <arturo> T221225 creating tools-sgebastion-09 for testing sssd stuff

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:26:50Z] <arturo> T221225 rebooting tools-sgebastion-08 to cleanup sssd

bd808 added a comment.EditedApr 24 2019, 12:35 AM

Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:19:17Z] <arturo> T221225 creating tools-sgebastion-09 for testing sssd stuff

The scheduler apparently picked one of the same cloudvirts for this as one of the current bastions. It triggered https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudcontrol1003&service=tools+project+instance+distribution. I ack'ed the alert and downtimed it for 2 days so you can have a chance to move it to another cloudvirt or finish testing and rm the instance.

[00:32]  <icinga-wm>	ACKNOWLEDGEMENT - tools project instance distribution on cloudcontrol1003 is CRITICAL: CRITICAL: sgebastion class instances not spread out enough BryanDavis New instance added for testing, needs to be placed on a different cloudvirt manually. - The acknowledgement expires at: 2019-04-26 00:31:10. https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting

Mentioned in SAL (#wikimedia-cloud) [2019-04-24T09:18:54Z] <arturo> T221225 reallocating tools-sgebastion-09 to cloudvirt1008

Mentioned in SAL (#wikimedia-cloud) [2019-04-25T12:49:32Z] <arturo> T221225 using profile::ldap::client::labs::client_stack: sssd in horizon for tools-sgebastion-09 (testing)

Change 506435 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] sssd: sudo: don't install sudo-ldap if using sssd

https://gerrit.wikimedia.org/r/506435

Change 506614 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] standard: refactor into a profile

https://gerrit.wikimedia.org/r/506614

Change 506682 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] standard refactor: remove standard class from base classes

https://gerrit.wikimedia.org/r/506682

aborrero triaged this task as Normal priority.Apr 26 2019, 4:02 PM
aborrero moved this task from Triage to In Progress on the Toolforge board.

Change 506682 merged by Jbond:
[operations/puppet@production] standard refactor: remove standard class from base classes

https://gerrit.wikimedia.org/r/506682

Change 506614 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] standard: introduce a wrapper profile and use it in CloudVPS

https://gerrit.wikimedia.org/r/506614

Mentioned in SAL (#wikimedia-cloud) [2019-04-29T10:20:58Z] <arturo> disable puppet in all servers to livehack tools-puppetmaster-01 to test T221225

Mentioned in SAL (#wikimedia-cloud) [2019-04-29T10:27:52Z] <arturo> T221225 reboot tool-sgebastion-09 for testing sssd

Change 506979 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: introduce proper base role/profile for VM instances

https://gerrit.wikimedia.org/r/506979

Mentioned in SAL (#wikimedia-cloud) [2019-04-29T11:22:16Z] <arturo> T221225 re-enable puppet agent in all toolforge servers

Change 507005 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] vagrant: refactor roles into profiles

https://gerrit.wikimedia.org/r/507005

My plans are:

  • decouple sudo from sudo-ldap at module level, i.e, at modules/sudo/manifests/ level. So we have ::sudo and ::sudo::sudo-ldap.
  • introduce support in sudo::user and sudo::group for this new decoupling. They are now aware of the different flavors, defaulting to ::sudo (prod preferred).
  • introduce a global hiera key sudo_flavor which can be used in profiles to select which sudo flavor we need. By default, in prod, ::sudo is preferred, in labs ::sudo::sudo-ldap is preferred.
  • by default, all VMs use sudo-ldap flavor (behaviour untouched) until we switch to sssd, in which case we start using the sudo flavor instead.
  • a VM using sssd will need 2 hiera keys:
profile::ldap::client::labs::client_stack: sssd
sudo_flavor: sudo
  • with time, when enough VMs are using this config, we can simply switch the defaults, and probably simplify a bit the puppet code

I needed to refactor some callers of sudo::user and sudo::group to be able to introduce hiera keys, that's why the additional patches.

Relevant patches, in order, are (skipping some already merged ones):

Please @bd808 @Andrew @Bstorm @MoritzMuehlenhoff @jbond review if this plan makes sense.

Change 507005 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] vagrant: refactor roles into profiles

https://gerrit.wikimedia.org/r/507005

Change 506979 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: introduce proper base role/profile for VM instances

https://gerrit.wikimedia.org/r/506979

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T10:56:32Z] <arturo> T221225 create tools-sgebastion-0test for more sssd tests

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T11:07:19Z] <arturo> T221225 disable puppet in toolforge

Change 506435 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/506435

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T12:45:11Z] <arturo> adding sudo_flavor: sudo hiera config to all puppet prefixes with sssd (T221225)

Mentioned in SAL (#wikimedia-cloud) [2019-04-30T12:50:43Z] <arturo> enable puppet in all servers T221225

Change 507317 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "sudo: decouple sudo from sudo-ldap"

https://gerrit.wikimedia.org/r/507317

Change 507317 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "sudo: decouple sudo from sudo-ldap"

https://gerrit.wikimedia.org/r/507317

Mentioned in SAL (#wikimedia-operations) [2019-04-30T16:52:26Z] <arturo> merging change to profile::base and ::raid https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to T221225

Change 507376 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "Revert "sudo: decouple sudo from sudo-ldap""

https://gerrit.wikimedia.org/r/507376

Mentioned in SAL (#wikimedia-cloud) [2019-05-06T10:53:54Z] <arturo> T221225 disable puppet in all toolforge servers for testing sssd patch (puppetmaster livehack)

Change 507376 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/507376

Mentioned in SAL (#wikimedia-cloud) [2019-05-06T11:34:23Z] <arturo> T221225 reenable puppet

Change 508311 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/508311

Change 508311 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap

https://gerrit.wikimedia.org/r/508311

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:16:36Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-sgebastion prefix, and reboot tools-sgebastion-07/08

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:27:06Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-docker-registry prefix, and reboot tools-docker-registry-[03-04]

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:31:31Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-checker prefix, and reboot tools-checker-03

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:39:50Z] <arturo> T221225 disable puppet in all tools-worker nodes in preparation for sssd

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T13:04:48Z] <arturo> T221225 depool and rebooted tools-worker-1001 in preparation for sssd migration

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:07:59Z] <arturo> T221225 switch to sssd/sudo in puppet prefix for tools-worker

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:09:51Z] <arturo> T221225 depool & switch to sssd/sudo & reboot & repool tools-worker-1001

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:12:55Z] <arturo> T221225 depool & switch to sssd/sudo & reboot & repool tools-worker-1002

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:26:55Z] <arturo> T221225 hard reboot tools-worker-1001

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:35:12Z] <arturo> T221225 hard reboot tools-worker-1001 again

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:44:21Z] <arturo> T221225 back to classic/ldap hiera config in the tools-worker puppet prefix

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:49:12Z] <arturo> T221225 repool tools-worker-1002 (using nscd/nslcd and sudoldap)

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T18:13:20Z] <arturo> T221225 created tools-worker-1029 to test sssd/sudo stuff

Mentioned in SAL (#wikimedia-cloud) [2019-05-28T18:15:06Z] <arturo> T221225 for the record, tools-worker-1001 is not working after trying with sssd

Mentioned in SAL (#wikimedia-cloud) [2019-05-29T10:13:36Z] <arturo> enroll the tools-worker-1029 VM into toolforge k8s, but leave it cordoned for sssd testing purposes (T221225)

aborrero closed this task as Resolved.May 29 2019, 11:06 AM

Well. Closing this task now, since we do have sudo support now. Will open other tasks for the other issues we have.