Page MenuHomePhabricator

Some tool maintainers not showing in Striker UI following config change
Closed, ResolvedPublicBUG REPORT

Description

NOTE: with the current production config, Toolforge members show in Striker and admins do not. This is because the role migration is half done with 'projectadmin' having been replaced in Keystone with 'member' but 'user' not yet having been replaced with 'reader' application assumed that each user had one role.
[21:40]  <kindrobot> I'm getting this error when I try to create a phab board from Toolforge: "No Phabricator accounts found for tool maintainers." I'm pretty sure I've done this before without incident.
[21:41]  <    bd808> kindrobot: which tool?
[21:42]  <kindrobot> ducttape
[21:43]  <    bd808> uhh.... toolsadmin doesn't think that tool has any maintainers. This smells like an LDAP issue of some sort.
[21:44]  <kindrobot> Hmm. I just created the tool about 30 minutes ago. I can administer the tool, but I can't add myself as a maintainer.
[21:46]  <kindrobot> bd808 does wlh have any maintainers? I know I added some in the past, but I'm not seeing any now.
[21:47]  <    bd808> I am not either. We rolled out a config change earlier today that may be causing this rather than general LDAP problems. Let me poke around a bit.

Event Timeline

bd808 changed the task status from Open to In Progress.Mar 9 2023, 10:03 PM
bd808 claimed this task.
bd808 triaged this task as High priority.
bd808 created this task.
$ python3 manage.py shell
>>> from striker.tools.models import Tool
>>> ducttape = Tool.objects.get(cn='tools.ducttape')
>>> ducttape.maintainer_ids()
['kindrobot']
>>> ducttape.maintainers()
<QuerySet []>
>>> from striker.tools.models import Maintainer
>>> Maintainer.objects.get(uid='kindrobot')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/opt/lib/python/site-packages/django/db/models/manager.py", line 82, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/opt/lib/python/site-packages/django/db/models/query.py", line 408, in get
    self.model._meta.object_name
striker.tools.models.Maintainer.DoesNotExist: Maintainer matching query does not exist.
>>> Maintainer.objects.get(uid='bd808')
<Maintainer: BryanDavis>

The failure to load @SDunlap's Maintainer record is unexpected. Direct inspection of the backing LDAP store seems to show the expected data:

$ ldap uid=kindrobot cn objectClass
dn: uid=kindrobot,ou=people,dc=wikimedia,dc=org
cn: Stef Dunlap
objectClass: inetOrgPerson
objectClass: person
objectClass: ldapPublicKey
objectClass: posixAccount
objectClass: shadowAccount

We changed the values of settings.OPENSTACK_USER_ROLE and settings.OPENSTACK_ADMIN_ROLE today with https://gerrit.wikimedia.org/r/c/labs/striker/+/895140. This feels like the likely cause of strange behaviors. It may be entirely in this custom manager's _get_tool_users implementation:

class MaintainerManager(models.Manager):
    def _get_tool_users(self):
        if settings.TEST_MODE:
            # Hack to keep from trying to talk to openstack API from django
            # test harness
            return []
        users = cache.get_openstack_users()
        return (
            users[settings.OPENSTACK_USER_ROLE] +
            users[settings.OPENSTACK_ADMIN_ROLE]
        )

    def get_queryset(self):
        return super(MaintainerManager, self).get_queryset().filter(
            uid__in=self._get_tool_users()).order_by('cn')
>>> from striker.tools import cache
>>> users = cache.get_openstack_users()
>>> users.keys()
dict_keys(['glanceadmin', 'heat_stack_owner', 'admin', 'member', 'projectadmin', 'designateadmin', 'heat_stack_user', 'keystonevalidate', 'user', 'reader'])
>>> for role in users.keys():
...     print(role, len(users[role]))
...
glanceadmin 0
heat_stack_owner 0
admin 0
member 26
projectadmin 0
designateadmin 1
heat_stack_user 0
keystonevalidate 0
user 2406
reader 0

We changed the settings before the backing database for Keystone was updated. :/

Change 896031 had a related patch set uploaded (by BryanDavis; author: BryanDavis):

[operations/puppet@production] Revert "striker: Bump container version to 2023-03-09-005633-production"

https://gerrit.wikimedia.org/r/896031

Change 896031 merged by Legoktm:

[operations/puppet@production] Revert "striker: Bump container version to 2023-03-09-005633-production"

https://gerrit.wikimedia.org/r/896031

Mentioned in SAL (#wikimedia-operations) [2023-03-09T22:40:58Z] <bd808> Forced puppet run on cloudweb100[34] to apply quick fix for T331674

bd808 renamed this task from Some tool maintainers not showing in Striker UI to Some tool maintainers not showing in Striker UI following config change.Mar 9 2023, 10:43 PM
bd808 removed a project: Patch-For-Review.

Rolling back to the prior settings.OPENSTACK_USER_ROLE and settings.OPENSTACK_ADMIN_ROLE values has fixed the issue with non-admin users not showing in the UI. It should also fix the originally reported issue of failing to find Phabricator accounts for maintainers. That was caused by the maintainer list being empty due to the config mismatch. Luckily all of this was a client configuration issue without any data loss in the canonical LDAP storage.

I will stage a new patch to switch back to the new settings values and let @Andrew know that he should merge it only after granting the 'reader' role to everyone who currently has the 'user' role in the Toolforge.

NOTE: with the current production config, Toolforge members show in Striker and admins do not. This is because the role migration is half done with 'projectadmin' having been replaced in Keystone with 'member' but 'user' not yet having been replaced with 'reader'.
[22:42]  <    bd808> kindrobot: I think you should be able to create a phab board with Striker now.
[22:42]  <    bd808> I can see you listed on https://toolsadmin.wikimedia.org/tools/id/ducttape at least
[22:52]  <kindrobot> It worked! :D

Change 896194 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[operations/puppet@production] striker: Bump container version to 2023-03-09-185548-production

https://gerrit.wikimedia.org/r/896194

I do see people with the reader role in the tools project..

taavi@cloudcontrol1006 ~ $ os role assignment list --user kindrobot --project tools --names
+--------+---------------------+-------+---------------+--------+--------+-----------+
| Role   | User                | Group | Project       | Domain | System | Inherited |
+--------+---------------------+-------+---------------+--------+--------+-----------+
| user   | Stef Dunlap@Default |       | tools@Default |        |        | False     |
| reader | Stef Dunlap@Default |       | tools@Default |        |        | False     |
+--------+---------------------+-------+---------------+--------+--------+-----------+
taavi@cloudcontrol1006 ~ $ os role assignment list --user taavi --project tools --names
+--------------+-----------------+-------+---------------+--------+--------+-----------+
| Role         | User            | Group | Project       | Domain | System | Inherited |
+--------------+-----------------+-------+---------------+--------+--------+-----------+
| member       | Majavah@Default |       | tools@Default |        |        | False     |
| projectadmin | Majavah@Default |       | tools@Default |        |        | False     |
| user         | Majavah@Default |       | tools@Default |        |        | False     |
| reader       | Majavah@Default |       | tools@Default |        |        | False     |
+--------------+-----------------+-------+---------------+--------+--------+-----------+

Did you try logging out and back in in case the roles were getting cached somewhere?

I'm following this! Sometime soon I want to delete the old 'user' role entirely, @bd808 please let me know when you're confident that I can do so without totally breaking all existing striker use-cases :)

[16:37]  <    bd808> ah! I think my code is to blame!
[16:38]  <    bd808> It seems to memoize and assume that each user only has one role!
[16:38]  <andrewbogott> that would do it :)
[16:38]  <    taavi> oh of course, I even had the exact same issue with openstack-browser :D
[16:38]  <    bd808> yeah, I think this is the bug. it's in Striker's openstack.py and the users_by_role() method
[16:39]  <    bd808> because the client was copied from striker taavi :)
[16:40]  <andrewbogott> welp, I tried to provide a graceful transition by leaving the old roles in place and that just made it worse :)

Change 896430 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[labs/striker@master] openstack: Do not assume that each user has only one role

https://gerrit.wikimedia.org/r/896430

Change 896430 merged by jenkins-bot:

[labs/striker@master] openstack: Do not assume that each user has only one role

https://gerrit.wikimedia.org/r/896430

Change 896194 merged by Andrew Bogott:

[operations/puppet@production] striker: Bump container version to 2023-03-10-212005-production

https://gerrit.wikimedia.org/r/896194

Mentioned in SAL (#wikimedia-operations) [2023-03-13T17:04:55Z] <bd808> Ran cache.purge_openstack_users() for Striker following deploy of e1f7491 (T331674)