Page MenuHomePhabricator

LDAP problems following cloudservices2005-dev reimage
Closed, ResolvedPublic

Description

I'm observing keystone is having problems connecting to LDAP @ cloudservices2004-dev following latest cloudservices2005-dev reimage works.

ERROR ldappool ldap.INVALID_CREDENTIALS: {'desc': 'Invalid credentials'}

I've turned on debug mode on the server:

Jun 21 15:05:58 cloudservices2004-dev slapd[2867781]: conn=1026 fd=40 ACCEPT from IP=172.20.5.7:37510 (IP=0.0.0.0:389)
[..]
Jun 21 15:05:58 cloudservices2004-dev slapd[2867781]: conn=1026 op=0 BIND dn="uid=novaobserver,ou=people,dc=wikimedia,dc=org" method=128
Jun 21 15:05:58 cloudservices2004-dev slapd[2867781]: conn=1026 op=0 RESULT tag=97 err=49 text=

error 49 is indeed auth failure.

Related Objects

StatusSubtypeAssignedTask
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedayounsi
Resolvedcmooney
ResolvedPapaul
Resolvedcmooney
Resolvedcmooney
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedtaavi
Opencmooney
Resolvedaborrero
Opencmooney
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
Resolvedaborrero
ResolvedAndrew
ResolvedAndrew
ResolvedAndrew
OpenAndrew
OpenAndrew
ResolvedAndrew
Resolvedaborrero
OpenNone
OpenNone
Resolvedaborrero
Resolvedcmooney
Resolvedaborrero
Resolvedaborrero
OpenNone

Event Timeline

The observer password is set in hiera to lt-RiBeyokCO81bVvcX (this is a public password).

In the slapd database, this password is stored as d2aeac3d76cd790ea2ca017869b2c8394c08cb47 which:

aborrero@cloudservices2004-dev:~ $ echo e1NIQX1OWWRyUld1M2g1bDdhTlFPTXY0SXVjS2RCQXc9 | base64 -d
{SHA}NYdrRWu3h5l7aNQOMv4IucKdBAw=
aborrero@cloudservices2004-dev:~ $ sudo slappasswd -h {SHA} -s lt-RiBeyokCO81bVvcX
{SHA}NYdrRWu3h5l7aNQOMv4IucKdBAw=

Change 931964 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] codfw1dev: services: override cloudcontrol FQDN

https://gerrit.wikimedia.org/r/931964

Change 931968 had a related patch set uploaded (by Arturo Borrero Gonzalez; author: Arturo Borrero Gonzalez):

[operations/puppet@production] openldap: main-acls.erb: support keystone hosts without AAAA

https://gerrit.wikimedia.org/r/931968

The problem is LDAP IP-based ACLs.

aborrero changed the task status from Open to In Progress.Jun 21 2023, 5:07 PM
aborrero triaged this task as High priority.
aborrero moved this task from Backlog to Doing on the User-aborrero board.

Change 931968 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] openldap: main-acls.erb: support keystone hosts without AAAA

https://gerrit.wikimedia.org/r/931968

Change 931964 merged by Arturo Borrero Gonzalez:

[operations/puppet@production] codfw1dev: services: override cloudcontrol FQDN

https://gerrit.wikimedia.org/r/931964

The ACLs seem fixed, now getting:

ERROR ldappool ldap.SERVER_DOWN: {'desc': "Can't contact LDAP server", 'errno': 22, 'info': 'Invalid argument'}

This is hardly a minimal unit test, but my go-to ldap exploration routine works:

root@cloudcontrol2001-dev:~# ldapvi -h ldap://cloudservices2004-dev.codfw.wmnet:389 --user uid=novaadmin,ou=people,dc=wikimedia,dc=org -b ou=groups,dc=wikimedia,dc=org

It also works with the service address:

root@cloudcontrol2001-dev:~# ldapvi -h ldap://cloudservices2004-dev.private.codfw.wikimedia.cloud:389 --user uid=novaadmin,ou=people,dc=wikimedia,dc=org -b ou=groups,dc=wikimedia,dc=org

After @Andrew magic touch this is now working apparently.

The suspicion is this is an instance of T340127: systemctl restart keystone doesn't actually restart keystone sometimes happening.