Page MenuHomePhabricator

Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible
Closed, ResolvedPublic

Description

Grafana has native support for LDAP and can query LDAP groups. This was previously not compatible with our LDAP schema, but since T142817 it should be now.

We should evaluate whether it's possible to configure Grafana's authz to use that and provide the necessary access rights when needed, in order to be able to deprecate our grafana-admin workaround.

Details

Related Gerrit Patches:
operations/puppet : productioncache_text: remove grafana-admin request handling
operations/puppet : productionRemove grafana-admin.wikimedia.org virtualhost
operations/puppet : productiongrafana-admin: Remove from production
operations/puppet : productiongrafana: Allow skipping instantiation of grafana-admin
operations/dns : masterRemove grafana-admin.wikimedia.org
operations/puppet : productiongrafana: Readd grafana-admin group as editors
operations/puppet : productiongrafana: Also add Array to the ldap.toml.erb excludes
operations/puppet : productiongrafana: Double quote correctly ldap.toml parameters
operations/puppet : productiongrafana: Remove reference to grafana-admin from home page
operations/puppet : productiongrafana: Fix ldap.toml permissions
operations/puppet : productiongrafana: Enable grafana LDAP in production
operations/puppet : productiongrafana: Add migration script from proxy to LDAP auth
operations/puppet : productiongrafana: Hieraize parameters
operations/puppet : productiongrafana: Allow to modify the config in hiera
operations/puppet : productionSimplify profile::grafana::production
operations/puppet : productionRemove role::grafana::labs
operations/puppet : productionMove role::grafana::base to profile::grafana
labs/private : masterDeprecate passwords::grafana::labs

Event Timeline

faidon created this task.Jul 10 2017, 2:11 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2017, 2:11 PM
faidon moved this task from Inbox to Up next on the observability board.Jul 10 2017, 2:12 PM
akosiaris moved this task from Up next to In progress on the observability board.Aug 21 2017, 3:05 PM

Some preliminary results:

Authn

We can use grafana's LDAP authentication, albeit it has some caveats that are related to our currect way of authentication. Grafana queries LDAP for the following attributes: givenName, sn, cn, member_of, mail. These are mapped via configuration as follows

  • givenName + sn => Name
  • cn => username (aka login)
  • mail => email
  • memberOf => role of a user in an organization.

The caveat lies in the fact our currently enabled proxy enabled authentication downcases usernames while creating the entries in grafana's database (e.g. alexandros kosiaris) while the built-in grafana authn does not (e.g. Alexandros Kosiaris), which causes a variety of issues. Up to now I 've met at least 2, i.e.

  • users being created 2 times successfully
  • users being unable to be created because some constraint fails (it was the email in my case) (semantically just a variation of the above, but different user experience)

More may exist, I am still investigating.

So it looks like during the migration we would need audit the user database and normalize the username to what Grafana expects.

Another related caveat seems to be that data for a variety of users is inconsistently populated (things like name + email), presumably because these fields are left up to the user to populate in the proxy auth schema. However the LDAP auth scheme forces that data to be populated from the LDAP data (IMHO that's good), making it possible that some user entered data might be lost. Since we are only talking about names and emails I am not too worried about it.

On the good side of news, LDAP authn works like a charm alongside anonymous authn, that is users navigating to grafana.wikimedia.org are "authenticated" as the anonymous user (obtaining its authorizations) and then can click on sign-in and authenticate using LDAP allowing us to deprecated grafana-admin.wikimedia.org.

Authz

LDAP's memberOf seems to work quite fine and in fact is enforced on every login. That is we can say things like people belonging to cn=ops,ou=groups,dc=wikimedia,dc=org become admins while people in cn=wmf,ou=groups,dc=wikimedia,dc=org become editors and so on. In case a user belongs in more than one groups (the case for many of our users) the order the groups are defined in grafana's configuration is the deciding factor, the very first entry being the winner. Aside from some care that needs to be taken when defining these groups in order I don't see any major caveat

faidon moved this task from In progress to Up next on the observability board.Sep 6 2017, 3:05 PM
akosiaris moved this task from Up next to In progress on the observability board.Oct 2 2017, 3:37 PM

Change 404308 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Move role::grafana::base to profile::grafana

https://gerrit.wikimedia.org/r/404308

Change 404309 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Hieraize parameters

https://gerrit.wikimedia.org/r/404309

Change 404311 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[labs/private@master] Deprecate passwords::grafana::labs

https://gerrit.wikimedia.org/r/404311

Change 404311 merged by Alexandros Kosiaris:
[labs/private@master] Deprecate passwords::grafana::labs

https://gerrit.wikimedia.org/r/404311

Change 404314 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Remove role::grafana::labs

https://gerrit.wikimedia.org/r/404314

Change 404319 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Simplify profile::grafana::production

https://gerrit.wikimedia.org/r/404319

Change 404320 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Allow to modify the config in hiera

https://gerrit.wikimedia.org/r/404320

Change 404321 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] WIP: grafana: Enable grafana's LDAP

https://gerrit.wikimedia.org/r/404321

Patchsets above clean up puppetization, drop the ugly distinction of labs vs production from code, moving that into hiera and enable LDAP in production, while disabling the proxy auth. Things still required are

  • Population of the ldap.toml configuration file
  • A script that will need to be run during the migration. I am currently working on a small python script that will have direct access to the sqlite database.

A few other things for consideration:

  • This is a migration and requires that we disable grafana-admin.wikimedia.org. For this to happen we need to have some basic informative message to all users capable of authenticating and a wikitech-l/wmfall announce of the migration
  • A rollback procedure should exist, but that's fortunately very easy as we are talking about some puppet revert and a copy of the sqlite database being restored.
  • The old grafana-admin.wikimedia.org site should continue existing for some time so that we don't immediately break saved URLs. There's a question of whether we should be redirecting to grafana.wikimedia.org or whether we should just display an informative message to users letting them know of the migration and the fact they should update any saved URLs they know about. The number of users we got (just people in LDAP) makes me think the latter is a more prudent approach. It will allow us to actually delete the grafana-admin.wikimedia.org DNS record at some point in time, even users who won't have noticed the email messages and banner will be actively informed and saved URLs are more probable to be updated.

Thanks so much for this, kudos! Any reason to not just 301 grafana-admin to grafana for a few months (and then just drop it)? Also, wmfall probably sounds excessive, I'd guess all of our users are in the ops list (which isn't just opsens).

The only reason I can think of is people still navigating to grafana-admin and using since it will still DTRT.

As far as the mailing list goes, users are at the following LDAP groups ops, nda, wmf. The last 2 groups have 183 people, whereas the ops list has 120 people. Which means 1/3 of people would not receive the notification so ops is not sufficient.

Talk on IRC suggests engineering@. It has 202 subscribers so it's probably a better candidate than ops@

Change 404651 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Add migration script from proxy to LDAP auth

https://gerrit.wikimedia.org/r/404651

Scheduling this for February 12th 2018, say 10:00 am UTC. I 'll run a few more tests and then send an informational message to engineering@ and wikitech@ and possibly also add a smaller banner to the Home dashboard in grafana.

Change 404319 merged by Alexandros Kosiaris:
[operations/puppet@production] Simplify profile::grafana::production

https://gerrit.wikimedia.org/r/404319

Change 404314 merged by Alexandros Kosiaris:
[operations/puppet@production] Remove role::grafana::labs

https://gerrit.wikimedia.org/r/404314

Change 404308 merged by Alexandros Kosiaris:
[operations/puppet@production] Move role::grafana::base to profile::grafana

https://gerrit.wikimedia.org/r/404308

Change 404320 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Allow to modify the config in hiera

https://gerrit.wikimedia.org/r/404320

Change 404309 abandoned by Alexandros Kosiaris:
grafana: Hieraize parameters

Reason:
This has actually been incorporated into I68df3eadc4b95848e52356ef4ad7a49735e40e07

https://gerrit.wikimedia.org/r/404309

Unfortunately, this aint gonna happen today. I 've had no time to test the migration yet and it would irresponsible to do it today. I 'll postpone it without an ETA until I 've managed to test this.

fgiunchedi moved this task from In progress to Up next on the observability board.Mar 5 2018, 4:12 PM
akosiaris moved this task from Up next to In progress on the observability board.Apr 16 2018, 3:24 PM

Information about the 14 grafana database duplicate users can be found at P7183 (WMF-NDA protected currently)

akosiaris added a subscriber: ema.May 31 2018, 7:53 AM

Deleting a user doesn't seem to cause issues in my tests, e.g. the Version History feature just stops listing the user that created the revision instead of breaking.

All of my tests went fine. Scheduling this for Wednesday June 27th. I 'll send an email to wikitech-l as well

Tgr awarded a token.Jun 13 2018, 4:28 PM
Tgr added a subscriber: Tgr.

Any thoughts on whether this might make something like T189531: All Wikimedia developer services should use single sign-on easier or harder in the future?

Any thoughts on whether this might make something like T189531: All Wikimedia developer services should use single sign-on easier or harder in the future?

Authentication wise I think it's irrelevant. The authentication methods in grafana are pluggable so aside from writing a authentication plugin for implementing the single sign on. Authorization wise the same holds true as far as I can tell (all it is is a mapping of the identity provider's group to grafana groups), albeit there are fewer implementations currently.

Change 404651 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Add migration script from proxy to LDAP auth

https://gerrit.wikimedia.org/r/404651

Change 404321 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Enable grafana LDAP in production

https://gerrit.wikimedia.org/r/404321

Change 442272 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Fix ldap.toml permissions

https://gerrit.wikimedia.org/r/442272

Change 442272 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Fix ldap.toml permissions

https://gerrit.wikimedia.org/r/442272

Hi,

  • I can see the change on https://grafana.wikimedia.org/login
  • However, I cannot login w. my LDAP username/password (LDAP username would be: GoranSMilovanovic): Error while trying to authenticate user.

Hi,

  • I can see the change on https://grafana.wikimedia.org/login
  • However, I cannot login w. my LDAP username/password (LDAP username would be: GoranSMilovanovic): Error while trying to authenticate user.

I am not surprised. The migration was not done by 14:13. Mind retrying ?

Change 442284 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Remove reference to grafana-admin from home page

https://gerrit.wikimedia.org/r/442284

@akosiaris All is superfine now. Thanks!

Change 442298 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Double quote correctly ldap.toml parameters

https://gerrit.wikimedia.org/r/442298

Change 442284 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Remove reference to grafana-admin from home page

https://gerrit.wikimedia.org/r/442284

Change 442298 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Double quote correctly ldap.toml parameters

https://gerrit.wikimedia.org/r/442298

Change 442306 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/dns@master] Remove grafana-admin.wikimedia.org

https://gerrit.wikimedia.org/r/442306

Change 442308 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Also add Array to the ldap.toml.erb excludes

https://gerrit.wikimedia.org/r/442308

Change 442308 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Also add Array to the ldap.toml.erb excludes

https://gerrit.wikimedia.org/r/442308

Change 442311 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Readd grafana-admin group as editors

https://gerrit.wikimedia.org/r/442311

Change 442312 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana-admin: Remove from production

https://gerrit.wikimedia.org/r/442312

Change 442313 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] grafana: Allow skipping instantiation of grafana-admin

https://gerrit.wikimedia.org/r/442313

Change 442311 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Readd grafana-admin group as editors

https://gerrit.wikimedia.org/r/442311

Re: announcement email of completion - https://lists.wikimedia.org/pipermail/wikitech-l/2018-June/090251.html
Are all the grafana-admin links directly replaceable with grafana? I.e. Can we just do a simple search & replace through the search results?

akosiaris changed the task status from Open to Stalled.Jun 27 2018, 8:58 PM

Re: announcement email of completion - https://lists.wikimedia.org/pipermail/wikitech-l/2018-June/090251.html
Are all the grafana-admin links directly replaceable with grafana? I.e. Can we just do a simple search & replace through the search results?

Yes

I 've gone through the 2 lists and did the search + replace wherever it made sense to update the link.

I filed T198631 for also doing this on the labs instance

grafana-admin.wikimedia.org now redirects to grafana.wikimedia.org (preserving the url structure) in order to migrate the last few users to it.

Change 442306 merged by Alexandros Kosiaris:
[operations/dns@master] Remove grafana-admin.wikimedia.org

https://gerrit.wikimedia.org/r/442306

And I 've just removed the grafana-admin.wikimedia.org DNS RR. In the last few days the number of accesses to that virtualhost were <10 so it's quite clear it's not really used anymore by anyone.

Change 442313 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana: Allow skipping instantiation of grafana-admin

https://gerrit.wikimedia.org/r/442313

Change 442312 merged by Alexandros Kosiaris:
[operations/puppet@production] grafana-admin: Remove from production

https://gerrit.wikimedia.org/r/442312

Change 449176 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] Remove grafana-admin.wikimedia.org virtualhost

https://gerrit.wikimedia.org/r/449176

Change 449176 merged by Alexandros Kosiaris:
[operations/puppet@production] Remove grafana-admin.wikimedia.org virtualhost

https://gerrit.wikimedia.org/r/449176

akosiaris closed this task as Resolved.Jul 30 2018, 2:02 PM

grafana-admin.wikimedia.org fully deprecated. I am resolving this.

Mvolz added a subscriber: Mvolz.Aug 9 2018, 6:20 PM

I've tried to log-in with my LDAP credentials and couldn't. I've tried every username/email/password combo, I also tried the reset my password using the link in grafana using two e-mails and usernames associated with my accts but didn't receive any e-mails. Checked spam folder, nothing. Thoughts?

I've tried to log-in with my LDAP credentials and couldn't. I've tried every username/email/password combo, I also tried the reset my password using the link in grafana using two e-mails and usernames associated with my accts but didn't receive any e-mails. Checked spam folder, nothing. Thoughts?

You are in no group in LDAP that gives access to the read-write part of grafana. This has nothing to do with this task or the upgrade. You did not have access to grafana-admin.wikimedia.org either. Please file a task for requesting to be included to cn=wmf if you need access to create/edit dashboards/graphs

For what is worth, the reset password thing only works when using the grafana user database for authentication, which we are not as we are using LDAP (obviously)

Change 458764 had a related patch set uploaded (by Ema; owner: Ema):
[operations/puppet@production] cache_text: remove grafana-admin request handling

https://gerrit.wikimedia.org/r/458764

Change 458764 merged by Ema:
[operations/puppet@production] cache_text: remove grafana-admin request handling

https://gerrit.wikimedia.org/r/458764