Page MenuHomePhabricator

Verify GrowthBook access approach
Closed, ResolvedPublic

Description

@mpopov and I (@dr0ptp4kt) worked through the permission scheme for GrowthBook and think we have arrived at the roles needed. We are seeking review from @JVanderhoop-WMF and @KReid-WMF and confirmation of role permissions here in-task. I've added a number of other task subscribers for visibility.

Initial role assignment will be coordinated in several larger requests in order to reduce up-front form submissions. Subsequent access will need to be done on a case-by-case basis.

Roles

It's thought that it's best to keep the roles simple. It was hoped to avoid needing a custom role, but to minimize easily avoidable mistakes and to minimize showing irrelevant parts of the GrowthBook application we don't foresee supporting or using anytime soon, one custom role is required. Custom roles are an Enterprise feature.

The plan is three tiers of access, as follows.

Read Only users, a built-in role of GrowthBook, whose only permission is as follows. This would be the bulk of user seats.

Screenshot 2026-03-10 at 3.05.36 PM.png (1,790×110 px, 19 KB)

CustomElevatedAccess users, who have the permissions detailed below. This is meant for people who need to drive experiment and data analysis configuration. It's expected this will usually be A/B test fluent people with software/data/research engineer/scientist/analyst titles.

Admin users, a built-in role of GrowthBook, who have all permissions. This is meant to be a shortlist including Ben, Balthazar, and select members of Experiment Platform Team, with occasional membership involving other DPE team members where this highest level of access is useful and necessary and comes with heightened expectations. This is for people who are fluent in the system and will do work involving approval of access, and who otherwise could be tasked with configuration of data sources, dimensions, SDK configuration (e.g., Attributes), and system Settings.

The (OSS, not Enterprise) concept of a Project in GrowthBook will need to be implemented, and the GrowthBook role for Read Only or CustomElevatedAccess will only be granted within the designated Project.

Process

  1. The WMF or WMDE staff member or contractor requiring access will visit https://idm.wikimedia.org/permissions/ and request access for the Wikimedia IDM ("Bitu") group for their level of GrowthBook access required: GrowthBook-ReadOnly, GrowthBook-CustomElevatedAccess, or GrowthBook-Admin (all three of these Bitu roles will need to be added to nda_groups.txt [and possibly more] via T419021: LDAP ("Bitu") group membership assignments for tiered access to GrowthBook, by the way).
  2. A member of Experiment Platform Team (or other GrowthBook Admin if no Experiment Platform Team member is available) will review the request, verifying:
    1. That the WMF or WMDE staff member request is for a user with an email domain of wikimedia.org or wikimedia.de (this can be verified at https://ldap.toolforge.org/user/<username>)
    2. The presence of the WMF or WMDE staff member or contractor's membership in the wmf or wmde LDAP group (this can be verified via https://ldap.toolforge.org/ ).
    3. That the WMF or WMDE staff member or contractor's ID in data.yaml is active (i.e., has ensure: present).
    4. The presence of the WMF or WMDE staff member or contractor's membership in the analytics_privatedata_users group in data.yaml. This is required for all GrowthBook users. Membership in this group is sufficient for GrowthBook-ReadOnly access.
    5. If GrowthBook-CustomElevatedAccess is requested, additionally that the WMF or WMDE staff member or contractor's ID in data.yaml is a member of analytics-product-users, analytics-wmde-users, or deployment in data.yaml.
    6. If GrowthBook-Admin is requested, additionally the user must be a member of Experiment Platform Team or Data Platform Engineering where this access level is useful and necessary and comes with heightened expectations. This is for people who are fluent in the system and will do work involving approval of access, and who otherwise could be tasked with configuration of data sources, dimensions, SDK configuration (e.g., Attributes), and system Settings. Confirmation of this level of access need should be coordinated on an org-facing DPE IM channel.
  3. Assuming everything checks out, the team member in Experiment Platform Team (or other GrowthBook Admin if no Experiment Platform Team member is available) will grant the access in Wikimedia IDM ("Bitu"). We would like, if possible, for the Wikimedia IDM ("Bitu") roles of GrowthBook-ReadOnly or GrowthBook-CustomElevatedAccess to be automatically provisioned if the constraints in 2.* are satisfied, but we are unclear if this is possible (writes to LDAP can be more complicated than reads from LDAP, and are often achieved by a human in the loop); automatic provisioning of GrowthBook-Admin should not occur and should require manual human involvement.
  4. An Experiment Platform Team member (or other GrowthBook Admin if no Experiment Platform Team member is available) should follow up with the requester on the talk-to channel on Slack to check back in 120 minutes to confirm that they are able to see material in the designated Project, and to point them to the access instructions to be reminded of system expectations. In case the LDAP provisioning becomes automated, there will need to be some means for the Experiment Platform Team member (or other GrowthBook Admin if no Experiment Platform Team member is available) to know to follow up (or a bot that notifies on the IM channel will be needed; this IM automation may be more complicated than is worth it although there are hooks for IM used in some cases today). Here's some language that can be put into the access instructions as an example:

When you use GrowthBook, as with other systems, it comes with the expectation that you will not attempt to subvert the system's security controls, that you will not share your access credentials, that you won't use the software with prohibited data or in prohibited jurisdictions, and that you will be mindful that the system is used for configuration and measurement of experiments and so it's important to exercise caution in use of the application's facilities. We use the Enterprise version of this software, relying on its stock OSS functionality as well as the Enterprise functionality; the Enterprise functionality is governed under separate terms from the OSS portion of the software, so we don't extend the software package in general (and where we do, we do so so only in conversation and alignment with GrowthBook). If you have any questions, please do let us know in the talk-to IM chat channel with Experiment Platform Team.

The message from the member of Experiment Platform Team (or other GrowthBook Admin if no Experiment Platform Team member is available) or bot (depending on bot capabilities) on IM can say something like

Your access to GrowthBook has been granted. Please visit https://growthbook.wikimedia.org/ in 120 minutes to verify your access to the <project name TBD> Project. As a reminder, please take note of the system expectations at <link>.

Enforcement
A script (e.g., k8s CronJob) will need to recurrently:

  1. Ensure synchronization of Bitu access level with growthbook.wikimedia.org and growthbook-next.wikimedia.org (e.g., every 10-30 minutes) in the designated Project. In case a user occupies multiple applicable LDAP roles, the LDAP role with the higher level of access corresponding to the appropriate role in GrowthBook should be the one that is assigned in GrowthBook (e.g., if a user has both GrowthBook-ReadOnly and GrowthBook-CustomElevatedAccess then they should be granted CustomElevatedAccess).
  2. Ensure that violations of conditions 2.A, 2.B, 2.C, or 2.D, 2.E, or 2.F result in appropriate revocation of the Project-mapped GrowthBook role(s). If 2.A, 2.B, 2.C, or 2.D are violated, all roles should be revoked.
  3. If 2.A, 2.B, 2.C, or 2.D are violated or if a user hasn't logged into GrowthBook within 90 days, then issue a "Remove User" API call to GrowthBook. The "Remove User" call marks the record as inactive in GrowthBook's user database; reactivation can occur upon successful login (subsequent access to the Project is subject to 2.A, 2.B, 2.C, and 2.D being met, minimally, by way of synchronization).
  4. Report on user seating (probably a combination of a regular report, plus alerting in case active seating is nearing agreed limits requiring attention).

Separate tasks to be filed. It will be advisable to configure this first for growthbook-next.wikimedia.org, and then upon verification of it working properly, growthbook.wikimedia.org.

Basic GrowthBook access is automagically provisioned by way of SSO by virtue of a user having a wikimedia.org (soon, wikimedia.de, as well) email address, and thus a Project will need to be established (to separate legitimate data access from merely being a GrowthBook user by way of SSO) and the script will be necessary to ensure that active user seats are managed within agreed plan limits.

Review
An annual manual review of access 3-4 months before contract renewal should be scheduled on the Experiment Platform Team calendar.

Instructions

Documentation of GrowthBook on Wikitech wiki will cover the access request and system expectations as noted above.

CustomElevatedAccess Permissions

Here's the exhaustive set of permissions for CustomElevatedAccess.

Screenshot 2026-03-10 at 3.04.10 PM.png (1,790×1,418 px, 199 KB)

Screenshot 2026-03-10 at 3.04.36 PM.png (1,790×1,554 px, 302 KB)

Screenshot 2026-03-10 at 3.04.53 PM.png (1,790×1,012 px, 175 KB)

Screenshot 2026-03-10 at 3.05.36 PM.png (1,790×110 px, 19 KB)

Currently Out of Scope

Access for individuals with email addresses outside of wikimedia.org and wikimedia.de domains is presently not forecast to be allowed. However, should this need be surfaced, it will need to be scrutinized. Strict processes and vetting would be required due to the ability of the system's service ID to access data stores bearing sensitive data and for the experiment configuration to affect production services (albeit with software verifying the scope of such experiment configuration prior to configuration deployment). It will most likely be easiest for there to be arrangement of a wikimedia.org or wikimedia.de email address, with an expiration date on its access, for granting of any further access to GrowthBook. This however is a future consideration and not one to be approached lightly.

Although growthbook-next.wikimedia.org will be useful for confirmation of basic function of the schema described in this task, and it will be useful for verification of non-breakage during GrowthBook upgrades, it isn't intended to be used for system configuration and experiment data analysis the same way as growthbook.wikimedia.org. That is to say that, although it will be important during a staged upgrade to ask a member or two below the Admin level of access to confirm they can login to growthbook-next.wikimedia.org and see the Project there, they aren't expected to be doing routine work on growthbook-next.wikimedia.org.

Event Timeline

dr0ptp4kt renamed this task from DRAFT: Verify GrowthBook access approach to Verify GrowthBook access approach.Mar 10 2026, 10:25 PM
dr0ptp4kt updated the task description. (Show Details)
dr0ptp4kt edited subscribers, added: BTullis; removed: Tullis.

Following up here: WMDE access requires additional validation beyond merely a wikimedia.de email address. I'm checking if there's an existing data store that maintains a list of those with NDA (I'm not referring to the L3 form here, although that is typically employed for POSIX access, which is now looking more like a requirement for Read Only access for WMDE if wanting to keep it simple in the wikimedia.de case) and expiration date for such email addresses. The signs are pointing toward the need for a Project as a means for gating all Read Only access, though.

@dr0ptp4kt please let me know if I'm parsing this appropriately. Looks like there are two decisions to be made here:

1. Should Read Only access require a formal request/approval process?

  • Currently anyone with @wikimedia.org or @wikimedia.de email would automatically get Read Only access via SSO.
  • Alternative would be to gate even Read Only access behind a request, using GB Projects.

My question: What is our limit of Read Only seats, and are we worried that accessing without a request/approval process will push us up against that limit?

2. Should WMDE users have the same Read Only access as WMF users?
Let us know what you find out about the Affiliate agreement, @dr0ptp4kt. From my perspective, we want to invite more experimentation and collaboration there.

Thanks! Inline...

@dr0ptp4kt please let me know if I'm parsing this appropriately. Looks like there are two decisions to be made here:

1. Should Read Only access require a formal request/approval process?

  • Currently anyone with @wikimedia.org or @wikimedia.de email would automatically get Read Only access via SSO.

After looking into this further, @wikimedia.de will need an additional check. The simplest requirement is presence in the analytics_private_data group in data.yaml for an active ID, which connotes existing approved access.

  • Alternative would be to gate even Read Only access behind a request, using GB Projects.

My question: What is our limit of Read Only seats, and are we worried that accessing without a request/approval process will push us up against that limit?

Given the need to gate WMDE access, the remaining open question is on gating of WMF seats. At this point I recommend that only users in analytics_private_data be allowed access. With the requirement for analytics_private_data access and use of a Project, we should be able to keep the user roster well contained. This should be supplemented by the script, as it should remove access in case a person visits GrowthBook with their SSO ID (which auto-registers them and creates an active user) but hasn't been granted enhanced Project access by way of a Phabricator request.

2. Should WMDE users have the same Read Only access as WMF users?
Let us know what you find out about the Affiliate agreement, @dr0ptp4kt. From my perspective, we want to invite more experimentation and collaboration there.

I agree, simple is better. It will, however, require an extra step. I'm going to update the task description to try to rework things so the process is still as streamlined as possible.

Thanks @dr0ptp4kt -- it seems that this still has a lot of steps that require humans in the loop (supervisor approval, admin verification, etc). I worry that this could create a considerable backlog if our team is the only approver path.

  • Could Read Only requests be auto-approved once Bitu group membership and LDAP checks pass? Especially since there is no write access, could we handle the majority of requests (for visibility) without requiring phab tickets etc?
  • Could CustomElevatedAccess be approved by any Admin, not only members of Experiment Platform team?
  • Admin requests seem like the main tier that should truly warrant Experiment Platform Team sign-off every time.

Also, I wonder if we could be more concrete re: admin eligibility. "Useful and necessary" is very broad :-)

  • The person owns or is the primary developer of a GrowthBook-integrated feature
  • The person is responsible for experiment platform infrastructure or configuration
  • The person is responsible for adding approved metrics, etc etc
  • The person is acting as a responder for an active incident involving GrowthBook

And finally: 90-min SLA seems pretty aspirational if our cron job runs every 60 min. Should we say 120 min (i.e. two cycles) to be safe?

Thanks @dr0ptp4kt -- it seems that this still has a lot of steps that require humans in the loop (supervisor approval, admin verification, etc). I worry that this could create a considerable backlog if our team is the only approver path.

Yeah, I get the concern. We're familiar that seeking out approval for the private group sometimes requires some back and forth, for example.

Now, Wikimedia IDM ("Bitu") will email the admins for the corresponding groups if a user requests access to such a group. So let's see here...continuing...

  • Could Read Only requests be auto-approved once Bitu group membership and LDAP checks pass? Especially since there is no write access, could we handle the majority of requests (for visibility) without requiring phab tickets etc?

Good idea! That and the data.yaml checks together could be sufficient. The data.yaml checks would be seeing in data.yaml that (1) the user is in the private group and seeing in data.yaml` that (2) the user's ID is active and that the email address matches with the (appropriately domain allowlisted) LDAP email address. It's easy enough to bookmark the places to look. Slightly cumbersome, not excessively so, though.

I bet you're wondering if the automation script could check this and auto-approve the LDAP request (and subsequently add the user to the right place in GrowthBook) if it sees all conditions are in good shape. Maybe it could, although honestly I'm not sure (sometimes write actions on LDAP are prohibited except for by way of an ops team member running commands in a privileged mode). Would you like me to check about the feasibility of that?

  • Could CustomElevatedAccess be approved by any Admin, not only members of Experiment Platform team?

It could (that was the intention of who in principle could approve). Although I think in practice it would usually be an Experiment Platform core team member, and that seems appropriate as the team supports Test Kitchen / GrowthBook users.

Now, seeing if we can streamline this more: if the user is a member of analytics-product-users, analytics-wmde-users, or deployment in data.yaml, that in addition to the other checks (email address on the allowlisted domains, LDAP membership in the allowlisted roles, private data access in data.yaml, active ID in data.yaml) stands in as a proxy for those who are associated with metrics definition and CMDB type things that mutate the real production systems already. What do you think about that so that we forego the Phabricator step?

It still requires extra work and very likely at some point someone will need to be reminded to ensure they're a member of one of those other three groups to get this CustomElevatedAccess by way of Wikimedia IDM ("Bitu"). I guess we could have a fallback procedure here to get their manager approval on Phabricator if they are in the unusual situation of not being in one of those supplementary groups (analytics-product-users, analytics-wmde-users, or deployment in data.yaml) but are cleared by all other checks...but hopefully that would be rare.

If all Experiment Platform team core members with admin access to GrowthBook are out, I can imagine how another non-core team member (a DP SRE, me as I cutover to some other project work) could handle looking at these requests.

Same deal here I think around whether automation could be applied. Would need to check if that's even possible. Reading LDAP isn't the problem AFAIK, it's more the writing that could be slightly difficult IIRC.

  • Admin requests seem like the main tier that should truly warrant Experiment Platform Team sign-off every time.

Although I trust that Ben, Balthazar, and (soon, as I cut over to other project work) I could handle this well if an Experiment Platform Team member isn't available, I think that's probably a good idea.

But, if you think it is okay to allow any GrowthBook Admin (which should, again, be a really small group), happy for that as well. It's sort of hard to imagine too many cases where the entirety of people with GrowthBook Admin access are out and a request for Admin can't wait for the Experiment Team Platform team members with GrowthBook Admin to be back available.

Do you want to make this Bitu only? Or a Phabricator ticket in addition to Bitu? This of course should be be looked at carefully in either case, I think. I'm not so sure that we'd really ever want to have this level of access be automatically provisioned in case LDAP writes were even possible.

Also, I wonder if we could be more concrete re: admin eligibility. "Useful and necessary" is very broad :-)

True. I'd say it's for people who are fluent in the system and can approve access, and who otherwise could be tasked with configuration of data sources, dimensions, SDK configuration (notably Attributes, but more than that), and system Settings. (There will be more than that eventually across all the checkboxes, although those are sort of the main ones. There are some pieces involving Fact Tables that in practice require CustomElevatedAccess to have access from what we see, so CustomElevatedAccess users need to be careful when doing things with them, by the way). I think what you wrote is a nice way of phrasing it:

  • The person owns or is the primary developer of a GrowthBook-integrated feature
  • The person is responsible for experiment platform infrastructure or configuration
  • The person is responsible for adding approved metrics, etc etc
  • The person is acting as a responder for an active incident involving GrowthBook

Without the Phabricator trail for access, I might suggest a regular access review. The CronJob should keep the list healthy, but an annual manual review of access 3-4 months before contract renewal may be a good idea. I could update the calendar reminder about that.

And finally: 90-min SLA seems pretty aspirational if our cron job runs every 60 min. Should we say 120 min (i.e. two cycles) to be safe?

Seems reasonable enough. Possibly the CronJob can run even more frequently; I recall seeing the guideline that propagation may take up to 30 minutes. It may be faster than that, and possibly the CronJob can run even more frequently than 30 minutes. But setting a guideline for folks to check back 120 minutes later should hopefully buy more than enough time.

What do you think of all of this? It seems like we're getting closer to something that has less Phabricator back-and-forth...very little, actually. Although humans still need to do the due diligence as requests come in, presuming LDAP writes are off the table (TBH it isn't all bad to have some human review - it's a possible opportunity to reach out to users to welcome them; you could do that post hoc, though, too). Maybe a good idea to have a SOP of noting on the team IM channel, or adding somewhere the admins have access to (e.g., Google Form-backed Sheet) when one is going to review and approve an access request if there's manual review going on.

@dr0ptp4kt Thank you for the detailed writeup. A couple of questions:

  1. Does GrowthBook here refer to and include both the production and staging environments? Do we anticipate any separate access needs for staging or test environments?
  1. Is there a path to downgrade access from CustomElevatedAccess to ReadOnly if needed? If I'm reading it correctly, the script should handle this case once the Bitu group is updated:
  1. Ensure synchronization of Bitu access level with GrowthBook (e.g., every hour). Violations of conditions 3.A, 3.B, or 3.C should result in revocation of the Project-mapped GrowthBook role.

So this may reduce to a need to identify any users who no longer need elevated access but still need ReadOnly access in any regular access review.

Thanks. Inline...

@dr0ptp4kt Thank you for the detailed writeup. A couple of questions:

  1. Does GrowthBook here refer to and include both the production and staging environments? Do we anticipate any separate access needs for staging or test environments?

Good question. There's no simple path without some tradeoffs. However, I think simplest and most secure is to mirror the behavior for access provisioning in both places. Users mostly "won't" go over to growthbook-next to do things, but if they do they'd be constrained by the same Project notion this way (and the underlying Data Sources ought to be different things so as to avoid contamination from a lower environment to a higher environment and vice versa; even though both environments have data lake access). If we do an upgrade we'll know better that it is going to work on production, because people ought to be able to see that things basically work in -next, including the parts directly or indirectly affected by access control. This may need to change over time, but I would probably try to keep this simple. In terms of the automation work immediate term, it will be nice to only apply it to -next first, and then if that works, apply it to growthbook.wikimedia.org.

  1. Is there a path to downgrade access from CustomElevatedAccess to ReadOnly if needed? If I'm reading it correctly, the script should handle this case once the Bitu group is updated:

Yes. If the LDAP role for CustomElevatedAccess is removed but the LDAP role for Read Only is in place, then the synchronization should continue allowing for Read Only.

  1. Ensure synchronization of Bitu access level with GrowthBook (e.g., every hour). Violations of conditions 3.A, 3.B, or 3.C should result in revocation of the Project-mapped GrowthBook role.

So this may reduce to a need to identify any users who no longer need elevated access but still need ReadOnly access in any regular access review.

I haven't rewritten the Description of the task yet. It may help a little, although I think it's more that it will clear out users who shouldn't be in GrowthBook at all, rather than downgrade access. I think if we need to downgrade access (which is probably mainly if needing to manage seating), we'll probably want to first (a) add LDAP membership in the lower permission class, then (b) remove the higher permission class.

Does that seem workable? Other considerations? Lots of little tradeoffs, although hopefully we can minimize toil; in theory and usually in practice that helps with avoidance of mistakes, as well, although obviously there's virtue to human oversight; it's just that here going for simpler and clearer LDAP-driven access (except for Admins, I think) is probably the more robust solution now that we've gone around on this a couple times.

even though both environments have data lake access) … if we do an upgrade we'll know better that it is going to work on production, because people ought to be able to see that things basically work in -next, including the parts directly or indirectly affected by access control.

By the way, we settled on very limited access in -next in T417095: Add data lake as data source in growthbook-next + Slack.

In summary:

  • staging does not have access to the data lake and does not have any metrics defined
  • staging is only connected to an-test-presto, so that we can verify Presto + Kerberos connector still works
  • from non-admin, consumer perspective there is no reason to ever go growthbook-next
dr0ptp4kt updated the task description. (Show Details)

Thanks for answering my questions! This approach looks good to me.

Confirmed out of band with @JVanderhoop-WMF (meeting) and @brouberol (IM on team chat) we look okay to go here. I've filed a number of subtasks in the task graph of T417912: FY25-26 SDS2.2.7 / SDS 2.3.3 Application Permissions, and have updated an annual calendar reminder about triggering an annual access review (that will be in addition to any automation). I'm out for a bit, and have asked for Julie and Katherine to manage progress.

Thanks for all the context! Some random notes addressed to @RKemper, with whom I'll pair on this task.

The script we're intended to write will pull data from:

  • the data.yaml file defined in puppet (read-only)
  • our LDAP server (read-only)
  • the growthbook API (RW)

AFACIT, this will require the following:

  • enable a pod in Kubernetes to egress to our LDAP server. We currently don't define an exernal service for LDAP, so we will have to do so first.
brouberol@seaborgium:~$ host ldap-eqiad.wikimedia.org
ldap-eqiad.wikimedia.org is an alias for seaborgium.wikimedia.org.
seaborgium.wikimedia.org has address 208.80.154.79
brouberol@seaborgium:~$ host ldap-codfw.wikimedia.org
ldap-codfw.wikimedia.org is an alias for serpens.wikimedia.org.
serpens.wikimedia.org has address 208.80.153.49
  • We can access data.yaml directly from gerrit itself, without any kind of git pull (which would be cumbersome). We can download it with
curl https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/admin/data/data.yaml\?format\=TEXT | base64 -d | head -n 3
---
groups:
  absent:
  • I also don't think we already have an external service rule for gerrit
  • a Growthbook API key, which can be created in https://growthbook.wikimedia.org/settings/keys. It will need to be stored in the private puppet repo, and materialized in the CronJob pod environment through a Secret. This API key will need to be RW.
  • I initially thought we would require access to the mongo database to enquire about the last login date for a particular user, but we can get this through the GW API
    Screenshot 2026-03-20 at 12.40.43.png (1,077×1,070 px, 155 KB)
  • metrics emission: we want to know how many seats we currently occupy, and their topology (number of RO, number of RW, number of admins). I'm thinking we could push these metrics via the https://wikitech.wikimedia.org/wiki/Prometheus#Pushgateway
  • We'd need to build a docker image for the cronjob itself (defined in https://gitlab.wikimedia.org/repos/data-engineering/growthbook) based on a modern Debian (Trixie if possible), with the right python dependencies installed (at least pyyaml, requests and ldap3 and the prometheus client.
  • We probably want to have a dedicated logstash dashboard for this, to be able to have a clear audit log of the decisions taken by this cronjob

Drive-by comment: as email values in LDAP are inherintly not trustworthy, and as you're already checking membership in trusted groups LDAP and Unix that already grant access to private user information, to me your focus on user email domains here is a bunch of unnecessary complexity for no extra gain.

Drive-by comment: as email values in LDAP are inherintly not trustworthy, and as you're already checking membership in trusted groups LDAP and Unix that already grant access to private user information, to me your focus on user email domains here is a bunch of unnecessary complexity for no extra gain.

Heads up, am on OoO, just happened to see this.

Thanks for providing input. I think what you're saying is that the person approving the LDAP access could "just" look in the data.yaml file to check for the joint membership in analytics_privatedata_users (and if the user is going for growthbook-customelevatedaccess also membership in one of analytics-product-users, analytics-wmde-users, or deployment). Is that what you're referring to? I'm imagining you may be thinking about extra synchronization checks possibly becoming simplified as well, but did want to clarify. I usually find complexity the enemy of security, but that's counterbalanced by redundant checks and layered defenses. That is, of course, speaking in generalities.

The simpler we can make it for underlying implementation and avoidance of human error, of course the better. Again speaking in generalities!

Please do let folks know here ( especially @brouberol and @RKemper ) if there are more ways to streamline things without loss of correctness, safeguards, and end-user self-service registration.

An email verification link is sent for signup on https://idm.wikimedia.org/ as well as for email address changes through that portal. Although it's technically possible to have something like an email alias or Google Group established with a matching email address domain (IIRC nowadays this is typically done in cooperation with OIT, at least in the wikimedia.org case, can't speak to wikimedia.de Google Workspace practices), this should, in principle, provide a reasonable base gate for Wikimedia IDM and its base trusworthiness (although if you know of a way to subvert that please side channel Balthazar, Ben, Ryan, Eliza, and me [heads up, I'm technically OoO right now for another 5 business days; I just happened to be tinkering with a thing]). It is, I think, possible to modify the email address in Wikimedia IDM without that being reflected in data.yaml and vice versa (although TBH maybe there's some bidirectional sync mechanism w.r.t, email addresses I'm not accounting for, am just trying to write this up quickly). This is not to detract from the importance of the POSIX group membership. It could, of course, mean that some day perhaps we could dispense completely with the POSIX group membership (through new LDAP role assignment strategies aligned with real world human processes at staff/contractor transition points, which would have a number of ease-of-setup things, but for now, at least, the POSIX membership is there, with both upsides (safety valve) and downsides (friction)).

analytics_privatedata_users is the base requirement for access and that basically has process teeth. GrowthBook has an SSO domains facility based upon permissible email addresses, that is wired up in part in _growthbook_common_/values-dse-k8s-eqiad.yaml. idp.yaml and idp_test.yaml further require membership in wmf or nda as base requirements of login via CAS-SSO for GrowthBook. GrowthBook doesn't have awareness of the data.yaml file, and although our CAS-SSO wireup is aware of LDAP group membership, GrowthBook is not aware of it (they do have facilities for evolving autoprovision/deprovision capabilities and Apereo seems to have some of it, but we've been avoiding it). Because domain names for users in nda are comingled between holders of wikimedia.de email addresses and those who are holders of email addresses with different domains, and furthermore because wikimedia.de email addresses alone are insufficient (to your point, the POSIX membership is a strict requirement, which is based around a different Phabricator-based flow), we can't trust the membership in nda for streamlined access to GrowthBook. To be clear, this is being handled differently from other SSO applications where nda is conferring access to certain things.

Technically a user can have an email address associated in data.yaml and Wikimedia IDM in analytics_private_data with a domain not allowlisted; they're intended to not be able to get past the CAS-SSO sign-in screen in that case. There are some technical wrinkles around direct FerretDB access (requires you to have network access and know what you're doing, typically as someone with operator level or GrowthBook Admin-level access), GrowthBook APIs, and the GrowthBook UI that can allow potential unusual cases around different email subdomain and a separate password associated with the account - although the user still "should not" be able to get past the CAS-SSO gating without the basic membership gating. Not that it's the main point here, but we ought to iron away those wrinkles while we're at this (I'm corresponding on a different task to iron out the one visible wrinkle I saw, by the way).

On this point, I'm thinking we may want to actually consider replacing wmf and nda in idp.yaml and idp_test.yaml with the more specific groups once we have the new GrowthBook-oriented groups established (wmf and wmde would still be a requirement of access is the idea, but that's a base requirement in order to allow for CAS-SSO as well as human and sync processes). Right now we could arguably replace nda with wmde (but again the self-enrollment into the system presently requires a wikimedia.org email address anyway, so nda is presently neutralized; if it weren't it would be in a class similar to other data systems access; but as I said we're treating this system a bit differently) in any case so as to make the organization-linkage more explicit (and less subject to the various interpretations of nda). It's true that it's the case that users from WMDE are supposed to have gone through certain requirements prior to any access to data lake data assets (irrespective of membership in the nda group, although that has certainly been a proxy identifier for as much when coupled with the human identity of the WMDE staff/contractor; again, there's muddling with the nda interpretation, but it has always involved data.yaml to my knowledge, so there's a point of human intervention).

We're not able to have unlimited access to GrowthBook as part of the Enterprise setup here (we'll need to do publication of data from our own systems for data accessibility in some cases, to that point, above and beyond this more powerful level of access for the ordained LDAP users...which is a good security practice anyway, of course), so we can't allow for auto-enrollment based upon the POSIX groups (we also will want to help people with onboarding, maintain something of a nice "user journey" for people coming onto the system, and so on). Again, that doesn't detract from the data.yaml file as a possible source of truth. We're of course hoping to avoid having to add yet another POSIX group membership step in Phabricator.

This does remind me that it may be interesting to think about whether we would ever want to further streamline things in the POSIX files, by addition of, or repurposing of (refactoring), one or more POSIX groups in the data.yaml file. I know this comes up on occasion as a topic for consideration! analytics_private_data here is the initial gate and that's pretty strong. If analytics-product-users and analytics-wmde-users conferred automatic access in analytics_privatedata_users that could reuce the number of entries in analytics_privatedata_users. For now I'm not looking to use this here work as the driver for refactoring of that stuff. In an ideal world we'd have something like a data-nda group in LDAP that stands in for analytics_privatedata_users for any SSO-capable application; again in the GrowthBook case we do need an approval step and tiered access due to the Enterprise seating. I haven't worked through fully all of the possible edge cases around broader realignment of these IDs, but I'm sure there are complexities.

Okay, that's it for now. I'm OoO for a week longer, but did happen to catch this message so figured I'd reply rather than be thinking about it while OoO! The subsequent discussion and implementation stuff is largely in the hands of @brouberol , @RKemper , @mpopov , and @phuedx , although I'll get caught back up upon return (in half days for a week upon return, heads up and have some other duties following that).

I'm closing this as we've done the main design of the approach. Implementation of the approach to GrowthBook access is being tracked in other tasks under the same epic: T417912: FY25-26 SDS2.2.7 / SDS 2.3.3 Application Permissions