Page MenuHomePhabricator

Grafana LDAP sync fails post upgrade
Closed, ResolvedPublic

Description

The ldap -> grafana user sync has started failing, on digging the error is a 403:

DEBUG:urllib3.connectionpool:http://localhost:3000 "PUT /api/users/742 HTTP/1.1" 403 82
{"message":"User info cannot be updated for external Users"}
Traceback (most recent call last):
  File "./grafana-ldap-users-sync", line 316, in <module>
    sys.exit(main())
  File "./grafana-ldap-users-sync", line 300, in main
    syncer.sync_ldap_users(ldap_uids, role)
  File "./grafana-ldap-users-sync", line 183, in sync_ldap_users
    grafana_uid = self._update_user(user, name, email)["id"]
  File "./grafana-ldap-users-sync", line 142, in _update_user
    r.raise_for_status()
  File "/usr/lib/python3/dist-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: http://localhost:3000/api/users/742

Via the error message I tracked down the change to the following commit: https://github.com/grafana/grafana/pull/58449/files

And indeed even from the admin UI users are shown as 'synced via oauth', which is not the case. I don't know the full context yet of the authentication change (to be investigated)

Event Timeline

To clarify the impact this has: users added to LDAP groups ops/wmf/nda after feb 01 (the day of the upgrade, T328405) won't be able to edit dashboards until the sync can complete. Existing users can keep editing as usual.

Dug into this a bit, and AFAICT the "synced via oauth" and "User info cannot be updated for external Users" both relate back to isExternal:true on the user object. And isExternal:true does appear to be the case for our users populated by the grafana-ldap-users-sync script.

A sweeping update of users to isExternal:false, and adding isExternal:false to the params set by grafana-ldap-users-sync is what I'd be looking into next. Need to stop for now, but will continue next week if someone doesn't beat me to it.

We have recently added OIDC support to CAS so i wonder if we could migrate grafana to OIDC and actually sync the user data via the attributes released as oppose to having a separate script to sync directly from ldap? I think the following could work for this https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/generic-oauth/, we can test in wmcs or using idp-test in production/

Thanks @jbond that looks ideal, and if we can land on a working config would possibly allow us to simplify the ro/rw domain layout as well.

https://github.com/grafana/grafana/issues/8600#issuecomment-490788422 may be useful as a starting point for testing

Looking for nearer-term options I found that removing a user from the user_auth table will cause their user entry to no longer show 'synced via oauth' and isExternal attributes, and doing this for user 742 (from the task description) was enough to allow the sync process to complete successfully.

Feb 06 21:44:28 grafana1002 systemd[1]: Started Sync users and roles from LDAP to Grafana.
Feb 06 21:44:37 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:Created user redacted
Feb 06 21:45:26 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:Created user redacted
Feb 06 21:45:26 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:Created user redacted
Feb 06 21:45:27 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:Created user redacted
Feb 06 21:45:38 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:User admin is protected, not deleting
Feb 06 21:45:39 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:Deleted user 264
Feb 06 21:45:39 grafana1002 grafana-ldap-users-sync[9115]: INFO:__main__:Deleted user 779
Feb 06 21:45:39 grafana1002 systemd[1]: grafana-ldap-users-sync.service: Succeeded.

Looking for nearer-term options I found that removing a user from the user_auth table will cause their user entry to no longer show 'synced via oauth' and isExternal attributes, and doing this for user 742 (from the task description) was enough to allow the sync process to complete successfully.

Interesting find/investigation! I have repeated the same process on my user filippo and indeed that seems to be enough to mark the user as not coming from oauth anymore.

Also the newly-created users from the output above don't show up in the user_auth table, which is reassuring.

With all that said I think if there's consensus we can:

  • save a backup copy of grafana.db
  • delete all entries from user_auth table

This should bring back to an expected state, thoughts?

We have recently added OIDC support to CAS so i wonder if we could migrate grafana to OIDC and actually sync the user data via the attributes released as oppose to having a separate script to sync directly from ldap? I think the following could work for this https://grafana.com/docs/grafana/latest/setup-grafana/configure-security/configure-authentication/generic-oauth/, we can test in wmcs or using idp-test in production/

Thank you, yeah a tighter sso/grafana integration would be certainly nicer to have!

With all that said I think if there's consensus we can:

  • save a backup copy of grafana.db
  • delete all entries from user_auth table

This should bring back to an expected state, thoughts?

+1 sounds like a plan

Mentioned in SAL (#wikimedia-operations) [2023-02-08T09:14:07Z] <godog> purge user_auth table on grafana1002 - T328784

fgiunchedi claimed this task.

That seems to have done the trick! Thank you again @herron for the investigation, re: OIDC I have opened https://phabricator.wikimedia.org/T329146 and will optimistically close this task!

Excellent! Thanks for doing the user_auth purge!