After T218126: LDAP: try how sssd works with our servers we agreed on deploying sssd to more servers in Toolforge.
On 2019-04-17 I deployed sssd to the bastions and some users reported they were unable to become their tools.
After T218126: LDAP: try how sssd works with our servers we agreed on deploying sssd to more servers in Toolforge.
On 2019-04-17 I deployed sssd to the bastions and some users reported they were unable to become their tools.
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | yuvipanda | T130446 Unable to SSH onto tools-login.wmflabs.org | |||
Resolved | akosiaris | T130593 investigate slapd memory leak | |||
Resolved | aborrero | T217280 LDAP server running out of memory frequently and disrupting Cloud VPS clients | |||
Resolved | aborrero | T221225 sssd integration needs to be updated to include sudo config from LDAP support | |||
Resolved | aborrero | T223067 sudo is still broken on certain toolforge hosts |
Mentioned in SAL (#wikimedia-cloud) [2019-04-17T12:08:53Z] <arturo> T221225 rebooting bastions to clean sssd. We are back to nscd/nslcd until we figure out what's wrong here
The working part of the become script is exec /usr/bin/sudo -niu "$prefix.$tool" "$@". $prefix is 'tools' in the Toolforge project. $tool is the name passed on the command line by the user (for example 'bd808-test').
With the nscd/nslcd implementation of LDAP into our nss stack this works as expected:
$ hostname -f tools-sgebastion-08.tools.eqiad.wmflabs $ echo $USER bd808 $ sudo -niu tools.bd808-test $ echo $USER tools.bd808-test
With the sssd implementation, it fails:
$ hostname -f tools-sgeexec-0901.tools.eqiad.wmflabs $ echo $USER bd808 $ sudo -niu tools.bd808-test sudo: a password is required
This is a missing feature in our current sssd-sudo config. The sudoer rules that allow a Toolforge maintainer to become a given tool live in the LDAP directory. It looks like we currently have this explicitly disabled in the config:
[sssd] #debug_level=10 domains = wikimedia.org default_domain_suffix = wikimedia.org full_name_format = %1$s services = nss, pam, ssh config_file_version = 2 # (... cut ...) [domain/wikimedia.org] #debug_level=10 id_provider = ldap auth_provider = ldap # (... cut ...) # disable stuff not provided by LDAP (value of id_provider is used by default) sudo_provider = none # (... cut ...)
I think we need the sssd equivalent of the nscd/nslcd config from /etc/sudo-ldap.conf added:
BASE dc=wikimedia,dc=org URI ldap://ldap-ro.eqiad.wikimedia.org:389 ldap://ldap-ro.eqiad.wikimedia.org:389 # The next settings are not honored by OpenLDAP but are honored by sudo-ldap and /etc/sudo-ldap.conf is a symlink to /etc/ldap/ldap.conf BINDDN cn=proxyagent,ou=profile,dc=wikimedia,dc=org BINDPW Eche0ieng8UaNoo SUDOERS_BASE ou=sudoers,cn=tools,ou=projects,dc=wikimedia,dc=org SSL start_tls TLS_CHECKPEER yes TLS_REQCERT demand TLS_CACERTDIR /etc/ssl/certs TLS_CACERTFILE /etc/ssl/certs/ca-certificates.crt TLS_CACERT /etc/ssl/certs/ca-certificates.crt
(The LDAP proxy user password here is public intentionally, don't freak out!)
Change 504817 had a related patch set uploaded (by BryanDavis; owner: Bryan Davis):
[operations/puppet@production] ldap: Add support for sudo rules in sssd client config
Change 504817 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ldap: Add support for sudo rules in sssd client config
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:41:45Z] <arturo> T221225 disable puppet agent in the bastions
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:43:20Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: sssd in the puppet bastion prefix
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T09:49:22Z] <arturo> T221225 run puppet agent in the bastions and reboot them with sssd
It seems to work.
aborrero@tools-sgebastion-07:~$ sudo -niu tools.bd808-test tools.bd808-test@tools-sgebastion-07:~$ logout aborrero@tools-sgebastion-07:~$ become arturo-test-tool tools.arturo-test-tool@tools-sgebastion-07:~$
Please reopen if there are any other issue.
Still problems:
aborrero@tools-sgebastion-07:~$ sudo su eugene233 eugene233@tools-sgebastion-07:~$ become isa sudo: a password is required
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:16:32Z] <arturo> T221225 disable puppet in tools-sgebastion-08 for sssd testing
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:27:50Z] <arturo> T221225 rebooting tools-sgebastion-07 to clean sssd confiuration
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T10:28:02Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: classic in the puppet bastion prefix
I'm currently live-hacking tools-sgebastion-08, debugging sssd I see some weird messages:
(Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'tools.isa' matched without domain, user is tools.isa (Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [wikimedia.org] (Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name (Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name (Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name (Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name (Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [dp_get_account_info_handler] (0x0200): Got request for [0x2][BE_REQ_GROUP][idnumber=54010] (Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): name 'tools.isa' matched without domain, user is tools.isa (Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_parse_name_for_domains] (0x0200): using default domain [wikimedia.org] (Tue Apr 23 10:38:27 2019) [sssd[nss]] [client_recv] (0x0200): Client disconnected! (Tue Apr 23 10:38:27 2019) [sssd[nss]] [get_client_cred] (0x0080): The following failure is expected to happen in case SELinux is disabled: SELINUX_getpeercon failed [92][Protocol not available]. Please, consider enabling SELinux in your system. (Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Received client version [1]. (Tue Apr 23 10:38:27 2019) [sssd[nss]] [sss_cmd_get_version] (0x0200): Offered version [1]. (Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [dp_get_account_info_handler] (0x0200): Got request for [0x2][BE_REQ_GROUP][idnumber=500] (Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [sysdb_set_entry_attr] (0x0200): Entry [name=tools.isa@wikimedia.org,cn=groups,cn=wikimedia.org,cn=sysdb] has set [ts_cache] attrs. (Tue Apr 23 10:38:27 2019) [sssd[be[wikimedia.org]]] [sysdb_set_entry_attr] (0x0200): Entry [name=wikidev@wikimedia.org,cn=groups,cn=wikimedia.org,cn=sysdb] has set [ts_cache] attrs. (Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name (Tue Apr 23 10:38:27 2019) [sssd[nss]] [calc_flat_name] (0x0080): Flat name requested but domain has noflat name set, falling back to domain name [...]
The Flat name requested but domain has noflat name set, falling back to domain name entry is repeated many many times.
What I'm doing for testing is:
root@tools-sgebastion-08-# sudo su eugene233 eugene233@tools-sgebastion-08:~$ become isa sudo: a password is required eugene233@tools-sgebastion-08:~$ sudo -niu tools.isa sudo: a password is required
Change 505761 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] ldap: sssd: add missing bits for sudo
With the patch https://gerrit.wikimedia.org/r/505761 I can do this:
aborrero@tools-sgebastion-08:~$ sudo su eugene233 eugene233@tools-sgebastion-08:~$ sudo -niu tools.isa tools.isa@tools-sgebastion-08:~$ logout eugene233@tools-sgebastion-08:~$ become isa tools.isa@tools-sgebastion-08:~$ logout
Change 505761 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] ldap: sssd: add missing bits for sudo
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T12:57:37Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: sssd in the puppet bastion prefix, try again with sssd in the bastions, reboot them
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T13:06:00Z] <arturo> T221225 use profile::ldap::client::labs::client_stack: classic in the puppet bastion prefix, again. Rollback again.
Well, I suspect the sudo vs sudo-ldap package is causing some confusion here. In the live hacking I was doing in tools-sgebastion-08 I installed sudo and deleted sudo-ldap.
More testing is required before sssd can be deployed to the bastions.
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:19:17Z] <arturo> T221225 creating tools-sgebastion-09 for testing sssd stuff
Mentioned in SAL (#wikimedia-cloud) [2019-04-23T15:26:50Z] <arturo> T221225 rebooting tools-sgebastion-08 to cleanup sssd
The scheduler apparently picked one of the same cloudvirts for this as one of the current bastions. It triggered https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=cloudcontrol1003&service=tools+project+instance+distribution. I ack'ed the alert and downtimed it for 2 days so you can have a chance to move it to another cloudvirt or finish testing and rm the instance.
[00:32] <icinga-wm> ACKNOWLEDGEMENT - tools project instance distribution on cloudcontrol1003 is CRITICAL: CRITICAL: sgebastion class instances not spread out enough BryanDavis New instance added for testing, needs to be placed on a different cloudvirt manually. - The acknowledgement expires at: 2019-04-26 00:31:10. https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Troubleshooting
Mentioned in SAL (#wikimedia-cloud) [2019-04-24T09:18:54Z] <arturo> T221225 reallocating tools-sgebastion-09 to cloudvirt1008
Mentioned in SAL (#wikimedia-cloud) [2019-04-25T12:49:32Z] <arturo> T221225 using profile::ldap::client::labs::client_stack: sssd in horizon for tools-sgebastion-09 (testing)
Change 506435 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] sssd: sudo: don't install sudo-ldap if using sssd
Change 506614 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] standard: refactor into a profile
Change 506682 had a related patch set uploaded (by Jbond; owner: John Bond):
[operations/puppet@production] standard refactor: remove standard class from base classes
Change 506682 merged by Jbond:
[operations/puppet@production] standard refactor: remove standard class from base classes
Change 506614 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] standard: introduce a wrapper profile and use it in CloudVPS
Mentioned in SAL (#wikimedia-cloud) [2019-04-29T10:20:58Z] <arturo> disable puppet in all servers to livehack tools-puppetmaster-01 to test T221225
Mentioned in SAL (#wikimedia-cloud) [2019-04-29T10:27:52Z] <arturo> T221225 reboot tool-sgebastion-09 for testing sssd
Change 506979 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] cloudvps: introduce proper base role/profile for VM instances
Mentioned in SAL (#wikimedia-cloud) [2019-04-29T11:22:16Z] <arturo> T221225 re-enable puppet agent in all toolforge servers
Change 507005 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] vagrant: refactor roles into profiles
My plans are:
profile::ldap::client::labs::client_stack: sssd sudo_flavor: sudo
I needed to refactor some callers of sudo::user and sudo::group to be able to introduce hiera keys, that's why the additional patches.
Relevant patches, in order, are (skipping some already merged ones):
Please @bd808 @Andrew @Bstorm @MoritzMuehlenhoff @jbond review if this plan makes sense.
Change 507005 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] vagrant: refactor roles into profiles
Change 506979 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] cloudvps: introduce proper base role/profile for VM instances
Mentioned in SAL (#wikimedia-cloud) [2019-04-30T10:56:32Z] <arturo> T221225 create tools-sgebastion-0test for more sssd tests
Mentioned in SAL (#wikimedia-cloud) [2019-04-30T11:07:19Z] <arturo> T221225 disable puppet in toolforge
Change 506435 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap
Mentioned in SAL (#wikimedia-cloud) [2019-04-30T12:45:11Z] <arturo> adding sudo_flavor: sudo hiera config to all puppet prefixes with sssd (T221225)
Mentioned in SAL (#wikimedia-cloud) [2019-04-30T12:50:43Z] <arturo> enable puppet in all servers T221225
Change 507317 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "sudo: decouple sudo from sudo-ldap"
Change 507317 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] Revert "sudo: decouple sudo from sudo-ldap"
Mentioned in SAL (#wikimedia-operations) [2019-04-30T16:52:26Z] <arturo> merging change to profile::base and ::raid https://gerrit.wikimedia.org/r/c/operations/puppet/+/507357 related to T221225
Change 507376 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] Revert "Revert "sudo: decouple sudo from sudo-ldap""
Mentioned in SAL (#wikimedia-cloud) [2019-05-06T10:53:54Z] <arturo> T221225 disable puppet in all toolforge servers for testing sssd patch (puppetmaster livehack)
Change 507376 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap
Mentioned in SAL (#wikimedia-cloud) [2019-05-06T11:34:23Z] <arturo> T221225 reenable puppet
Change 508311 had a related patch set uploaded (by Arturo Borrero Gonzalez; owner: Arturo Borrero Gonzalez):
[operations/puppet@production] sudo: decouple sudo from sudo-ldap
Change 508311 merged by Arturo Borrero Gonzalez:
[operations/puppet@production] sudo: decouple sudo from sudo-ldap
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:16:36Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-sgebastion prefix, and reboot tools-sgebastion-07/08
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:27:06Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-docker-registry prefix, and reboot tools-docker-registry-[03-04]
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:31:31Z] <arturo> T221225 set sssd/sudo in the hiera config for the tools-checker prefix, and reboot tools-checker-03
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T12:39:50Z] <arturo> T221225 disable puppet in all tools-worker nodes in preparation for sssd
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T13:04:48Z] <arturo> T221225 depool and rebooted tools-worker-1001 in preparation for sssd migration
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:07:59Z] <arturo> T221225 switch to sssd/sudo in puppet prefix for tools-worker
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:09:51Z] <arturo> T221225 depool & switch to sssd/sudo & reboot & repool tools-worker-1001
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:12:55Z] <arturo> T221225 depool & switch to sssd/sudo & reboot & repool tools-worker-1002
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:26:55Z] <arturo> T221225 hard reboot tools-worker-1001
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:35:12Z] <arturo> T221225 hard reboot tools-worker-1001 again
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:44:21Z] <arturo> T221225 back to classic/ldap hiera config in the tools-worker puppet prefix
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T17:49:12Z] <arturo> T221225 repool tools-worker-1002 (using nscd/nslcd and sudoldap)
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T18:13:20Z] <arturo> T221225 created tools-worker-1029 to test sssd/sudo stuff
Mentioned in SAL (#wikimedia-cloud) [2019-05-28T18:15:06Z] <arturo> T221225 for the record, tools-worker-1001 is not working after trying with sssd
Mentioned in SAL (#wikimedia-cloud) [2019-05-29T10:13:36Z] <arturo> enroll the tools-worker-1029 VM into toolforge k8s, but leave it cordoned for sssd testing purposes (T221225)
Well. Closing this task now, since we do have sudo support now. Will open other tasks for the other issues we have.