Page MenuHomePhabricator

[cumin] [openstack] Openstack backend fails when project is not set
Open, Stalled, HighPublicBUG REPORT

Description

This happens both in cloudcumin1001 and in cloud-cumin-03.

Note: as a workaround until this task is resolved, we manually applied the patch https://gerrit.wikimedia.org/r/c/operations/software/cumin/+/868814 to both cloudcumin1001 and cloud-cumin-03, overwriting the file /usr/lib/python3/dist-packages/cumin/backends/openstack.py

What happens?:

fnegri@cloudcumin1001:~$ sudo cumin 'O{*}'
Caught Unauthorized exception: The request you have made requires authentication. (HTTP 401) (Request-ID: req-1a882424-b48f-47f7-885c-fcad991fc231)

What should have happened instead?:

Cumin should list all hosts in all projects. This used to work until a few weeks/months ago in cloud-cumin-03. It was never tested in cloudcumin1001.

Other information (browser name/version, screenshots, etc.):

Stack trace: https://phabricator.wikimedia.org/P52513

@Andrew tried to apply this patch manually to cloud-cumin-03 and it fixed the issue: https://gerrit.wikimedia.org/r/c/operations/software/cumin/+/868814

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 868814 had a related patch set uploaded (by FNegri; author: Andrew Bogott):

[operations/software/cumin@master] add domain param to openstack backend

https://gerrit.wikimedia.org/r/868814

fnegri triaged this task as Medium priority.

@fnegri thanks for the task. Have you checked on the openstack logs if there is anything explicit on what the missing permission was?

From the cumin logs:

2023-09-19 13:23:26,434 [DEBUG 282873 keystoneauth.identity.v3.base.get_auth_ref] Making authentication request to https://openstack.eqiad1.wikimediacloud.org:25000/v3/auth/tok
ens
2023-09-19 13:23:26,436 [DEBUG 282873 urllib3.connectionpool._new_conn] Starting new HTTPS connection (1): openstack.eqiad1.wikimediacloud.org:25000
2023-09-19 13:23:27,023 [DEBUG 282873 urllib3.connectionpool._make_request] https://openstack.eqiad1.wikimediacloud.org:25000 "POST /v3/auth/tokens HTTP/1.1" 401 None
2023-09-19 13:23:27,026 [DEBUG 282873 keystoneauth.session.request] Request returned failure status: 401

But a previous call to the same endpoint was successful at the start of the same run:

2023-09-19 13:23:25,348 [DEBUG 282873 urllib3.connectionpool._make_request] https://openstack.eqiad1.wikimediacloud.org:25000 "POST /v3/auth/tokens HTTP/1.1" 201 None

And those were the only 2 calls made to that api:

$ sudo grep "POST /v3/auth/tokens" /var/log/cumin/cumin.log
2023-09-19 13:23:25,348 [DEBUG 282873 urllib3.connectionpool._make_request] https://openstack.eqiad1.wikimediacloud.org:25000 "POST /v3/auth/tokens HTTP/1.1" 201 None
2023-09-19 13:23:27,023 [DEBUG 282873 urllib3.connectionpool._make_request] https://openstack.eqiad1.wikimediacloud.org:25000 "POST /v3/auth/tokens HTTP/1.1" 401 None

So I'm wondering if this was some sort of re-authentication from keystoneauth because of expired token or because it had to authenticate for another domain or similar and that failed because of lack of authorization.

That said, yes I know there are the 2 pending patches and I should really find the time to debug/test/work on them. This is also the end of the quarter and I'm fairly busy finishing things, do you think it could wait the beginning of October?

@Volans October is fine, we'll make sure that the CI is passing in the meantime.

wondering if this was some sort of re-authentication from keystoneauth because of expired token or because it had to authenticate for another domain

I think this is because of the domain, but I'm not 100% sure.

fnegri raised the priority of this task from Medium to High.Oct 20 2023, 4:54 PM
fnegri changed the task status from Open to In Progress.Oct 26 2023, 4:09 PM
fnegri changed the task status from In Progress to Stalled.Apr 15 2024, 12:38 PM

Last week I tried reproducing this in a local DevStack environment. I was partially successful but I could not test all the scenarios I had in mind because I had some issues setting up the environment itself and I could not create servers.

I will move this back to "Stalled" but when I have some more time I would like to test the following scenarios:

  • Running Cumin with O{*}, O{domain:foo} and O{project:bar}
  • Each of the above with different user roles (any combination of admin, member, or reader on system, domain or project scopes)

Another thing we want to ensure is that Cumin is working both when the policy os_compute_api:servers:index:get_all_tenants is set to True and when it is set to False. Given that this policy is by default set to context_is_admin (which by default maps to role:admin), testing the different roles listed above (admin, member, reader) in an environment with default settings should be enough.