Page MenuHomePhabricator

Fix up Gerrit duplicate LDAP external ids
Closed, ResolvedPublic

Description

The Gerrit accounts (scheme username:) are associated with LDAP accounts (scheme gerrit:) via All-Users.git refs/meta/external-ids.

Upstream provides a script to verify duplicate accounts due to case sensitiveness in a given namespace. For Gerrit local accounts we had a single one (username:Kaldari) which I have deleted (T323097).

In T307334#8394490, I have found out with 199 duplicate LDAP accounts.

Upstream provides a script contrib/find-duplicate-usernames.sh gerrit to find the duplicates which takes a scheme as parameter and it reports 398 entries (199 x 2).

Event Timeline

The reason is we have ldap.localUsernameToLowerCase=true. The upstream script to do the conversion had issue and instead @thcipriani crafted a shell script for that:

1#!/usr/bin/env bash
2#
3# Gerrit Username To Lowercase
4# =====================
5#
6# Script for converting gerrit:-scheme usernames to lowercase directly
7# in a checkout of refs/meta/external-ids of the All-Users git repo.
8#
9# Copyright: Tyler Cipriani <tcipriani@wikimedia.org> 2019
10# License: GPLv3+
11
12# Utility logging methods
13info() {
14 printf '[INFO] %s' "$@"
15}
16println() {
17 printf '%s\n' "$@"
18}
19
20# Convert username with gerrit: schema to lowercase
21#
22# accepts a username as a mixed case string without a schema by:
23#
24# 1. Convert username mixed case to username lowercase
25# 2. Compute sha1sum of lowercase schema for use as new file name
26# 3. Find the old file name using git grep
27# 4. Git move the old file to the new file path (computed with sha1sum)
28# 5. Replace the username mixed-case in the new file with username lowercase
29#
30# param username: string, mixed case
31usernameToLower() {
32 local username username_lower shasum new_file old_file
33
34 username="$1"
35 username_lower="${username,,}"
36
37 shasum=$(printf "gerrit:%s" "${username_lower}" | shasum -a 1)
38
39 new_file=$(printf '%s/%s\n' "${shasum:0:2}" "${shasum:2:38}")
40 old_file=$(git grep --full-name --files-with-matches "\"gerrit:${username}\"")
41
42 if [ -f "$new_file" ]; then
43 println "The new file '${new_file}' exists!!!! for '${username}'. Aborting!"
44 exit 1
45 fi
46
47 git mv "$old_file" "$new_file"
48
49 # Change username to lowercase in new file
50 sed -i "s/gerrit:${username}/gerrit:${username_lower}/" "$new_file"
51}
52
53# Find any gerrit:-schema users with capital letters
54# Look to see if there is a lowercase version
55# If not, convert user to lowercase
56main() {
57 while read -r user; do
58 # Grep for lowercase user
59 if git grep "\\[externalId \"gerrit:${user,,}\"\\]" &>/dev/null; then
60 continue
61 fi
62 info "Converting ${user}..."
63 usernameToLower "$user"
64 println "DONE!"
65 done < <(git grep -P 'gerrit:.*[A-Z]+.*' | sed -e 's/.*:\[externalId "gerrit:\(.*\)"]/\1/')
66}
67
68main "$@"

The conversion commit is from July 2019 e11abef2f05ed53f3e1a4a407259cde93fa3cbd3 and made users under gerrit namespace to be all lower case. For some unknown reason it missed 202 accounts which can be found via git grep -P 'gerrit:.*[A-Z]' e11abef2f0.

Picking a random user mglaser:

FileDateAccountCommit message
2a/69edcda6384958b7f0b678a9282ae67d0bc746Fri Jun 8 17:16:18 2018gerrit:MglaserImport external IDs from ReviewDb
80/4dba41ce7318a2a9e42ac85317e06403fb3807Sat Jun 9 22:45:59 2018 +0000gerrit:mglaserUpdate external IDs

Thus I suspect we did ran the upstream Java program to do the conversion and it failed to delete the migrated entries (such as gerrit:Mglaser in the example).

A year later our shell script got run. It is supposedly should exit 1 when the accounts has already been converted. That clearly did not happen.

I have verified all of the Gerrit external ids for LDAP (scheme gerrit:) having an upper case letter have an equivalent all lower case. The only differences I have spotted is some email addresses have been updated.

Thus I propose we:

  • delete all the 199 duplicate LDAP external ids (git grep -l 'gerrit:.*[A-Z]')
  • push the delete
  • Reindex ssh -p 29418 gerrit.wikimedia.org -- gerrit index start accounts --force

Most of the 199 accounts having an upper case letter are the same as the lower case version (checked with diff -i). I though 7 accounts that had a mismatch:

Two are a bit specific:

gerrit:Fomafix lacks an email address which is present in gerrit:fomafix. Thus it can be deleted.
gerrit:Ricordisamoa has an email which is already present as a mailto external id. It is thus a duplicate and can be deleted.

Five others have a different email between the mixed case and lower case. Maybe because they changed their email on Wikitech/LDAP at some point:

gerrit:Krinkle
gerrit:Revi
gerrit:Daniel Kinzler
gerrit:Harej
gerrit:SamanthaNguyen

Deleting those accounts would discard the associated email since it is not present in the gerrit: lower case version. So essentially we gotta migrate them to a mailto: external IDs which is how secondary emails are added. Hence when we have:

a5/e6fe0da47bb20944a3a369e9a78d9362eb44f5
[externalId "gerrit:Jane]
    accountId = 9999
    email = jane@example.org

We can create a mailto:jane@example.org entry:

d6/34e8ed9d8a1ead6940188e6538d5b4c07996e2
[externalId "mailto:jane@example.org"]
    accountId = 9999
    email = jane@example.org

Which in the end is changing the id and renaming the file:

diff --git a/a5/e6fe0da47bb20944a3a369e9a78d9362eb44f5 b/d6/34e8ed9d8a1ead6940188e6538d5b4c07996e2
- [externalId "gerrit:Jane]
+ [externalId "mailto:jane@example.org"]
      accountId = 9999
      email = jane@example.org

Et voilà! ™

Once this fixes applied, I have confirmed all of the 199 mixed case gerrit: accounts matches their associated lower case gerrit: accounts and can thus be all deleted

I have pushed a commit to All-Users.git to adjust all those accounts (cecd04c3ccd7b40097b753d0c7226b6861c6d5a2).

I took a snapshot of the user accounts details via the REST API using:

accounts_detail
#!/bin/bash
set -eu -o pipefail

for id in 34 128 562 834 1154 2598 3786; do
	curl -n "https://gerrit.wikimedia.org/r/a/accounts/$id/detail"|tail +2|jq .
done;

Then ran it after pushing and updating the cache (ssh -p 29418 gerrit.wikimedia.org -- gerrit index start accounts --force). Result:

ricordisamoa and fomafix are unchanged as expected, the other accounts had a secondary email associated.

Mentioned in SAL (#wikimedia-releng) [2022-11-15T20:20:07Z] <hashar> gerrit: removed legacy mixed case accounts for gerrit:Fomafix and gerrit:Ricordisamoa T323135#8397539

Mentioned in SAL (#wikimedia-releng) [2022-11-15T20:21:15Z] <hashar> gerrit: removed legacy mixed case accounts and moved the extra secondary email to a mailto id for gerrit:krinkle, gerrit:revi, gerrit:daniel kinzler, gerrit:harej and gerrit:samanthanguyen T323135#8397539

And this morning I thought: what about upper case unicode characters? There are three such accounts found by matching against unicode characters having the property letter + uppercase which in pcre is \p{Lu}.

0e/9f1ce0b97ba6f98633efb889ade8e90704d797
[externalId "gerrit:Ál"]  # LDAP account
	accountId = 2172
	email = al@blogmail.cc
48/546c9c45f1c4dc72928178958a74d5a4747953
[externalId "gerrit:Истенный"]  # LDAP account
	accountId = 2244
	email = 79687746639@ya.ru
ba/8c6fcd6e0dd090cce1c589d303a1943810d3aa
[externalId "gerrit:ԱշոտՏՆՂ"]  # LDAP account
	accountId = 3737
	email = ashotjanibekyan@gmail.com

Sounds like Gerrit only normalizes [A-Z] when using ldap.localUsernameToLowerCase = true and indeed it uses Locale.US:

java/com/google/gerrit/auth/ldap/LdapRealm.java
226   @Override
227   public AuthRequest authenticate(AuthRequest who) throws AccountException {
228     if (config.getBoolean("ldap", "localUsernameToLowerCase", false)) {
229       who.setLocalUser(who.getLocalUser().toLowerCase(Locale.US));
230     }
java/com/google/gerrit/auth/ldap/LdapAuthBackend.java
58   @Override
59   public AuthUser authenticate(AuthRequest req)
60       throws MissingCredentialsException, InvalidCredentialsException, UnknownUserException,
61           UserNotAllowedException, AuthException {
62     if (!req.getUsername().isPresent() || !req.getPassword().isPresent()) {
63       throw new MissingCredentialsException();
64     }
65 
66     String username =
67         lowerCaseUsername
68             ? req.getUsername().map(u -> u.toLowerCase(Locale.US)).get()
69             : req.getUsername().get();

Fun times! It also means we can ignore them for now.

Mentioned in SAL (#wikimedia-releng) [2022-11-16T08:45:42Z] <hashar> gerrit: deleted 192 LDAP accounts (scheme gerrit:) containing upper case characters which had an exact equivalent in an all lower case form. All-Users.git commit is 5e5800ecc8fd5da591567e616898dd6df988c0c8 # T323135

Mentioned in SAL (#wikimedia-releng) [2022-11-16T08:46:13Z] <hashar> gerrit: reindexed accounts ssh -p 29418 gerrit.wikimedia.org -- gerrit index start accounts --force # T323135

hashar claimed this task.

Upstream script contrib/find-duplicate-usernames.sh no more reports any duplicate against username:, gerrit: or external: scheme \o/