Page MenuHomePhabricator

Create bot to sync LDAP groups with related GitLab groups
Closed, ResolvedPublicFeature

Description

We have a few groups in the LDAP directory that tracks our developer accounts that have historically been used in Gerrit as part of the authorization for various repos. Specifically, the ops LDAP group is granted ownership on operations/puppet.git. The wmf LDAP group is also used to confer membership in the mediawiki gerrit group which grants a large number of rights on mediawiki/* repos in Gerrit.

LDAP group sync is a premium GitLab feature which means it is not available in the FOSS GitLab CE product. We also are not directly using LDAP for authentication, so it may not be a useful feature even if we broke from using the FOSS product.

It seems reasonably possible to write a bot that knows how to talk to both GitLab and LDAP with configuration telling it which LDAP group memberships should grant GitLab group permissions.

This feature request was inspired by a short irc discussion with @Joe related to the membership of the repos/sre group.

Event Timeline

bd808 changed the subtype of this task from "Task" to "Feature Request".Oct 3 2022, 4:01 PM

tl;dr all the ways i have found to do this with stuff already provided by gitlab requires an enterprise licences so solving this means writing code. I think it would be better if we could sync attributes using the SSO protocol implemented similar to the SAML group sync feature as opposed to a systemd timer that talks to ldap and gitlab directly

first worth saying that there are two issues at play

  • attribute syncing. i.e. having the same sshkeys, email address etc in gitlab
  • using ldap group membership to inform gitlab group membership

We already have some syncing e.g. the email address and we could likely sync more things e.g. ssh keys. however this was never explored in to much depth. attributes are currently passed using cas protocol attribute passing. however we could switch to many of the supported sso protocols as they all seem to have good support for attribute syncing.

In relation to group membership syncing unfortunately this was never in scope of the gitlab project and it seems when it was raised as an issue it was a bit late in the game to change the scope. When we looked, we noticed that ldap sync with group mappings are only available with gitlab enterprise. however ldap sync is for us the least preferred method to pass this data as it requires a direct connection to the ldap server. going forward we should try to pass any attributes using one of the supported authentication protocols implemented in apereo cas. The obvious protocol being CAS as its the native protocol however many other options exists.

With this in mind we explored using both cas and SAML. The support for cas in gitlab is quite immature (and i just found out now deprecated) and as such doesn't support any type of group integrations in the CE or EE additions, going this route would likely mean a lot of hacking on the gitlab side. Gitlab dose offer a SAML group sync option which would be ideal however this is only available in the EE addtions. if we could reimplement this as a CE plugin i think that could be a win for us and the community and the fact that it has been implemented means it can be hwoever i have no idea the architectural differences of gitlab CE vs EE or the risks of trying to do a clean room implementation of an OpenCore EE feature??

I have taken another look at the gitlab pages today and it seems like OIDC support provides a methods to pass group information and it is listed as being available in all tiers so this combined with the CAS deprecation and the fact that we intended to start using OIDC for any django apps would make this a good candidate to explore

I have taken another look at the gitlab pages today and it seems like OIDC support provides a methods to pass group information

Looking at this in more depth im not sure its quite what i thought at first. i think that link relates to if you use gitlab as as an iDP then it can release its own groups to authorised clients. so even though moving to OIDC still make senses i don't think it will easily solve this problem.

I created https://gitlab.devtools.wmcloud.org/ldap-sync-bot today for this purpose.

And the account is gone today. @brennen @Jelto Any idea why this might be?

And the account is gone today. @brennen @Jelto Any idea why this might be?

Well that's odd. I don't think we're doing any sort of a scheduled reset there, given all the other accounts...

I see this in /var/log/gitlab/gitlab-rails/backup_json.log:

{"severity":"INFO","time":"2023-06-02T02:01:11.566Z","correlation_id":null,"message":"Restoring database ... "}
...
{"severity":"INFO","time":"2023-06-02T02:02:04.768Z","correlation_id":null,"message":"Restore task is done."}

The disappearing account problem discussion has been moved to T338044.

dancy changed the task status from Open to In Progress.Jun 20 2023, 11:36 PM
dancy triaged this task as Medium priority.

@Jelto Our plan is to run sync-gitlab-group-with-ldap -c some-config-file.yaml repos/mediawiki wmf ops on gitlab1003 on a regular basis using a systemd timer. sync-gitlab-group-with-ldap is part of the https://gitlab.wikimedia.org/repos/releng/gitlab-settings repo so we would like to have a checkout of this repo somewhere on the gitlab server. Can you recommend a directory? Maybe /srv/gitlab-settings? Cc: @brennen

Maybe /srv/gitlab-settings

+1

@Jelto Our plan is to run sync-gitlab-group-with-ldap -c some-config-file.yaml repos/mediawiki wmf ops on gitlab1003 on a regular basis using a systemd timer. sync-gitlab-group-with-ldap is part of the https://gitlab.wikimedia.org/repos/releng/gitlab-settings repo so we would like to have a checkout of this repo somewhere on the gitlab server. Can you recommend a directory? Maybe /srv/gitlab-settings? Cc: @brennen

/srv/gitlab-settings sounds good! And thanks for creating the bot.

gitlab1003 is gitlab-replica-old.wikimedia.org at the moment. If you just want to verify everything without impacting production this should be fine. But keep in mind that the updated groups will get reverted by the restore happening at 2:00 and 14:00 UTC (because production is not syncing with ldap). If that's interfering with your testing we can also pause the restore for some time if needed. But if you schedule the job somewhere else (maybe 3:00 and 15:00 UTC) , you should be fine.

The repo and timer on the GitLab host can be configured with puppet git::clone and systemd::timer::job. Let me know if you need a change or review for that.

gitlab1003 is gitlab-replica-old.wikimedia.org at the moment.

Ah, that's my mistake. I meant gitlab1004.

The repo and timer on the GitLab host can be configured with puppet git::clone and systemd::timer::job. Let me know if you need a change or review for that.

Will do!

Change 932343 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[operations/puppet@production] Run LDAP group sync periodically on active gitlab server

https://gerrit.wikimedia.org/r/932343

Change 932343 merged by Jelto:

[operations/puppet@production] Run LDAP group sync periodically on gitlab replicas

https://gerrit.wikimedia.org/r/932343

Change 940162 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: make sure ldap_group_sync_user is created first

https://gerrit.wikimedia.org/r/940162

I merged the change to add a ldap group sync to the GitLab replicas. The job failed because the token was not configured properly:

requests.exceptions.HTTPError: 401 Client Error: Unauthorized for url: https://gitlab-replica.wikimedia.org/api/v4/user

And /etc/gitlab/group-management-config.yaml does not contain the correct token but stored_elsewhere.

I'm trying to find out what's missing to use the hiera value from private puppet.

Change 940176 had a related patch set uploaded (by Jelto; author: Jelto):

[labs/private@master] gitlab: add ldap sync token

https://gerrit.wikimedia.org/r/940176

Change 940176 merged by Jelto:

[labs/private@master] gitlab: add ldap sync token

https://gerrit.wikimedia.org/r/940176

Change 940179 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: move gitlab::ldap_group_sync_bot_token to private puppet

https://gerrit.wikimedia.org/r/940179

Change 940162 merged by Jelto:

[operations/puppet@production] gitlab: make sure ldap_group_sync_user is created first

https://gerrit.wikimedia.org/r/940162

Change 940179 merged by Jelto:

[operations/puppet@production] gitlab: move gitlab::ldap_group_sync_bot_token to private puppet

https://gerrit.wikimedia.org/r/940179

Sync job succeeded after moving the dummy token from puppet to labs private.

Jul 20 15:15:00 gitlab2002 systemd[1]: Starting Sync wmf and ops LDAP groups with GitLab repos/mediawiki group...
Jul 20 15:15:01 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:01,682 Collecting membership list of LDAP group wmf
Jul 20 15:15:01 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:01,729 Collecting membership list of LDAP group ops
Jul 20 15:15:01 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:01,733 Collecting member list of Gitlab group repos/mediawiki
Jul 20 15:15:08 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:08,566 ...
Jul 20 15:15:09 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:09,065 There are 0 GitLab users to create.
Jul 20 15:15:09 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:09,065 There are 1 members to add to repos/mediawiki.
Jul 20 15:15:09 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:09,065 There are 9 members to remove from repos/mediawiki.
Jul 20 15:15:09 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:09,065 ...
Jul 20 15:15:12 gitlab2002 sync-gitlab-group-with-ldap[386654]: 2023-07-20 15:15:12,546 Sync completed.
Jul 20 15:15:12 gitlab2002 systemd[1]: sync-gitlab-group-with-ldap.service: Succeeded.
Jul 20 15:15:12 gitlab2002 systemd[1]: Finished Sync wmf and ops LDAP groups with GitLab repos/mediawiki group.

The script fails during the daily restore with

requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://gitlab-replica-old.wikimedia.org/api/v4/user

This creates at least two alerts per day. As the script will move to production the production host at some point (where no restore is happening) I think we can accept the alerts for now.
But we should think about what happens if GitLab api is not available. Do we just fail and retry in 15 minutes or do we need some kind of retry?

But we should think about what happens if GitLab api is not available. Do we just fail and retry in 15 minutes or do we need some kind of retry?

Just let it fail and retry during the next timer run.

I checked the users in https://gitlab-replica.wikimedia.org/groups/repos/mediawiki/-/group_members again and it looks good to me. I also checked the logs of the job on one of the replicas with journalctl -u sync-gitlab-group-with-ldap.service and they look reasonable. At 14:00UTC there was some activity, but that's expected due to the restore of a fresh backup on the replicas.

I'd suggest to move the sync bot from the replicas to production host.

Change 945612 had a related patch set uploaded (by Jelto; author: Jelto):

[operations/puppet@production] gitlab: enable ldap group sync on active GitLab server

https://gerrit.wikimedia.org/r/945612

Change 945612 merged by Jelto:

[operations/puppet@production] gitlab: enable ldap group sync on active GitLab server

https://gerrit.wikimedia.org/r/945612

LDAP group sync is now active on the production instance (and not the replicas).

@dancy can you double check the output of the sync script?

journalctl -u sync-gitlab-group-with-ldap.service on gitlab1004, especially the first run Aug 08 10:15:01 today.

@Jelto I checked the logs and everything looks good.

This is running on gitlab-prod-1002.devtools.eqiad1.wikimedia.cloud too.

This is running in production.