Page MenuHomePhabricator

Cover GitLab accounts in SQL queries for affiliation setting
Closed, ResolvedPublic

Description

Looking at GitLab Merges data, e.g. on https://wikimedia.biterg.io/app/kibana#/dashboard/b2218fd0-bc11-11e8-8aac-ef7fd4d8cbad , shows numerous folks in the Submitters panel with (incorrect) "Independent" affiliation (they are WMF staff).

Update https://www.mediawiki.org/wiki/User:AKlapper_(WMF)/Bitergia_data_quality_queries afterwards.

Event Timeline

Aklapper created this task.

Note that indexing GitLab data in Perceval does not index email addresses, probably because that's non-public PII which would require auth and specific permissions before being able to pull via the GitLab API would require auth, so what we get is only a random username:

"1234567890abcdef1234567890abcdef12345678": {
    "enrollments": [],
    "identities": [
        {
            "email": null,
            "id": "1234567890abcdef1234567890abcdef12345678",
            "name": "SomeName",
            "source": "gitlab",
            "username": "someusername",
            "uuid": "1234567890abcdef1234567890abcdef12345678"
        }
    ]
}

That means in contrast to Gerrit there is nothing that would allow identifying staff or non-volunteer GitLab accounts, as we only have a random username.

This will mean that our affiliation stats will become more incorrect.

DB query to list accounts not marked as affiliated is SELECT CONCAT("https://wikimedia.biterg.io/identities/hatstall/", uuid), name, username, source FROM identities WHERE source = "gitlab" AND uuid NOT IN (SELECT uuid FROM enrollments) ORDER BY uuid; but as written in the previous comment this isn't much of a help.
https://www.mediawiki.org/w/index.php?title=User%3AAKlapper_%28WMF%29%2FBitergia_data_quality_queries&type=revision&diff=5181316&oldid=5061622

Closing this as I have my query documented and updated the DB.
Going to cover the bigger underlying problem in T306769#7875745 in separate task T306770: How to identify affiliation of indexed GitLab accounts.