Page MenuHomePhabricator

Cover GitLab accounts in SQL queries for affiliation setting
Closed, ResolvedPublic


Looking at GitLab Merges data, e.g. on , shows numerous folks in the Submitters panel with (incorrect) "Independent" affiliation (they are WMF staff).

Update afterwards.

Event Timeline

Aklapper created this task.

Note that indexing GitLab data in Perceval does not index email addresses, probably because that's non-public PII which would require auth and specific permissions before being able to pull via the GitLab API would require auth, so what we get is only a random username:

"1234567890abcdef1234567890abcdef12345678": {
    "enrollments": [],
    "identities": [
            "email": null,
            "id": "1234567890abcdef1234567890abcdef12345678",
            "name": "SomeName",
            "source": "gitlab",
            "username": "someusername",
            "uuid": "1234567890abcdef1234567890abcdef12345678"

That means in contrast to Gerrit there is nothing that would allow identifying staff or non-volunteer GitLab accounts, as we only have a random username.

This will mean that our affiliation stats will become more incorrect.

DB query to list accounts not marked as affiliated is SELECT CONCAT("", uuid), name, username, source FROM identities WHERE source = "gitlab" AND uuid NOT IN (SELECT uuid FROM enrollments) ORDER BY uuid; but as written in the previous comment this isn't much of a help.

Closing this as I have my query documented and updated the DB.
Going to cover the bigger underlying problem in T306769#7875745 in separate task T306770: How to identify affiliation of indexed GitLab accounts.