Page MenuHomePhabricator

Upgrade to Gerrit 3.5
Closed, ResolvedPublic

Description

We currently use Gerrit 3.4.4. This task is to upgrade to 3.5.x series.

Release notes https://www.gerritcodereview.com/3.5.html

Some sites experienced an increase in heap usage / GC cost. Mentioned by Sven on the repo-discuss list. It is apparently due to some tracing added to the change queries. Seems that can be worked around by setting tracing.performanceLogging to false

GerritHub.io incident report: https://docs.google.com/document/d/1BcDhsWZfxwtujwyJwDAkLxMDvf1pAnS_I0pwJOnNSms/edit#heading=h.z4mrbzle14w0

It may or may not be a problem. We use 20G on average out of 32G.

Release highlights

The few @hashar has noticed:

Case-insensitive usernames

Users can login with mixed case usernames without the risk to create duplicate accounts.

The change affects the following external ids:

  • gerrit (LDAP)
  • username (login, authenticated REST and git endpoints)

For a new Gerrit setup, usernames are case insensitive by default, while for existing installations the Gerrit admin can switch the functionality on/off using the auth.userNameCaseInsensitive setting in gerrit.config.

NOTE: In the All-Users.git repository, the SHA-1 sum of the account is computed preserving the case of the external ID. See the full details in the Gerrit config accounts documentation. Existing accounts can be migrated to the new SHA-1 sum using the offline or online migration tool.

It defaults to false for existing installation so that should be a noop for us. We will probably to do the migration once we understand the implications.

Request cancellation and execution deadlines

*Limit the maximal execution time for requests on the server side via deadline config*

Performance improvements on the change screen

A new approvals cache.

The new change.conflictsPredicateEnabled setting in gerrit.config disables the computation of the conflicts section avoiding a computation of complexity of O(nˆ2), where n is the number of open changes for the project the change belongs to. When set to false the GUI will leave the conflict changes section on change screen empty.

Default is true so we will keep the conflicting changes listed

Important notes

Support for Java 8 dropped

*We run Gerrit under Java 11 since T268225*

Schema and index changes

Schema and index changes

This release doesn’t contain schema changes.

The changes index version has been increased to version 71. By default the index is automatically rebuilt upon the Gerrit startup after the upgrade.

To run offline reindexing of the changes (optional when upgrading from v3.3.x or later):

java -jar gerrit.war reindex --index changes -d site_path

copy-approvals

From 3.5.2: The execution of the copy-approvals SSH command (online) or the java -jar gerrit.war copy-approval site command (offline) may take a long time to complete due to the full scanning of all projects.

*This can be done after the 3.5 upgrade but must be done before the 3.6 one*

Event Timeline

Change 824196 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@wmf/stable-3.5] Merge tag 'v3.5.2' into wmf/stable-3.5

https://gerrit.wikimedia.org/r/824196

Change 824200 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@deploy/wmf/stable-3.5] Gerrit v3.5.2 and rebuild plugins

https://gerrit.wikimedia.org/r/824200

hashar updated the task description. (Show Details)

I should probably have upgraded to 3.5 a few weeks ago but instead I went to finish the scap automatization which is T317412

As a note, Gerrit 3.7 will be released in November 2022 which will mark the end of life of Gerrit 3.4.

Change 824196 merged by jenkins-bot:

[operations/software/gerrit@wmf/stable-3.5] Merge tag 'v3.5.2' into wmf/stable-3.5

https://gerrit.wikimedia.org/r/824196

Change 856555 had a related patch set uploaded (by Hashar; author: Hashar):

[operations/software/gerrit@wmf/stable-3.5] Merge tag 'v3.5.4' into wmf/stable-3.5

https://gerrit.wikimedia.org/r/856555

This comment was removed by hashar.

Change 856555 merged by jenkins-bot:

[operations/software/gerrit@wmf/stable-3.5] Merge tag 'v3.5.4' into wmf/stable-3.5

https://gerrit.wikimedia.org/r/856555

The major change which affects us is normalizing username to be all lower case. I initially gave it a try with a local checkout of the All-Users.git database but it lacks a copy of refs/users/* and thus the script was essentially doing nothing.

Trying again tonight it fails as expected:

java -jar gerrit.war ChangeExternalIdCaseSensitivity --batch --dryrun
ExternalIdCaseSensitivityMigrator : Converting note name of external ID: gerrit:Katie Horn
ExternalIdCaseSensitivityMigrator : Duplicate external ID key: gerrit:Katie Horn
Converting external ID note names:   0% (    8/xxxx)

Inspecting refs/meta/external-ids surely there is a duplicate:

$ git grep -A2 -i gerrit:katie.horn
00/1f2986f8453fa299f16c55abd787a384bede90:[externalId "gerrit:Katie Horn"]
00/1f2986f8453fa299f16c55abd787a384bede90-      accountId = 153
00/1f2986f8453fa299f16c55abd787a384bede90-      email = khorn@wikimedia.org
--
21/500cc825e01f245cf44f05c4bd77348c446b73:[externalId "gerrit:katie horn"]
21/500cc825e01f245cf44f05c4bd77348c446b73-      accountId = 153
21/500cc825e01f245cf44f05c4bd77348c446b73-      email = khorn@wikimedia.org

There are 199 such LDAP users (scheme gerrit:):

$ git grep gerrit:.*[A-Z].*|wc -l
199

The highest accountId is 4565.

The reason for those accounts originates from when we turned on ldap.localUsernameToLowerCase T152640, T197083, T197257 etc. The upstream script to handle that LDAP username change failed apparently and left some accounts broken.

Eventually with T216605 we did another round of investigation. There were 1512 broken users at some point ( P8523 ) (we are down to 199 now).

@thcipriani wrote a shell script to fix up the references:

1#!/usr/bin/env bash
2#
3# Gerrit Username To Lowercase
4# =====================
5#
6# Script for converting gerrit:-scheme usernames to lowercase directly
7# in a checkout of refs/meta/external-ids of the All-Users git repo.
8#
9# Copyright: Tyler Cipriani <tcipriani@wikimedia.org> 2019
10# License: GPLv3+
11
12# Utility logging methods
13info() {
14 printf '[INFO] %s' "$@"
15}
16println() {
17 printf '%s\n' "$@"
18}
19
20# Convert username with gerrit: schema to lowercase
21#
22# accepts a username as a mixed case string without a schema by:
23#
24# 1. Convert username mixed case to username lowercase
25# 2. Compute sha1sum of lowercase schema for use as new file name
26# 3. Find the old file name using git grep
27# 4. Git move the old file to the new file path (computed with sha1sum)
28# 5. Replace the username mixed-case in the new file with username lowercase
29#
30# param username: string, mixed case
31usernameToLower() {
32 local username username_lower shasum new_file old_file
33
34 username="$1"
35 username_lower="${username,,}"
36
37 shasum=$(printf "gerrit:%s" "${username_lower}" | shasum -a 1)
38
39 new_file=$(printf '%s/%s\n' "${shasum:0:2}" "${shasum:2:38}")
40 old_file=$(git grep --full-name --files-with-matches "\"gerrit:${username}\"")
41
42 if [ -f "$new_file" ]; then
43 println "The new file '${new_file}' exists!!!! for '${username}'. Aborting!"
44 exit 1
45 fi
46
47 git mv "$old_file" "$new_file"
48
49 # Change username to lowercase in new file
50 sed -i "s/gerrit:${username}/gerrit:${username_lower}/" "$new_file"
51}
52
53# Find any gerrit:-schema users with capital letters
54# Look to see if there is a lowercase version
55# If not, convert user to lowercase
56main() {
57 while read -r user; do
58 # Grep for lowercase user
59 if git grep "\\[externalId \"gerrit:${user,,}\"\\]" &>/dev/null; then
60 continue
61 fi
62 info "Converting ${user}..."
63 usernameToLower "$user"
64 println "DONE!"
65 done < <(git grep -P 'gerrit:.*[A-Z]+.*' | sed -e 's/.*:\[externalId "gerrit:\(.*\)"]/\1/')
66}
67
68main "$@"

Which found 1305 accounts and we have manually fixed a few more after that.

I guess the above script does not handle all cases or the new files already existed and the script happily skip.

My plan:

  • Ensure all 199 gerrit: users having an upper case character have a matching all lower case gerrit: entry.
  • git rm the files
  • Craft a commit deleting all 199 files and push it
  • Reindex accounts:
    • ssh -p 29418 gerrit.wikimedia.org -- gerrit index start accounts --force

Example from above: would delete 00/1f2986f8453fa299f16c55abd787a384bede90 since it has gerrit:Katie Horn and there is another file with an all lower case account matching that account.

Change 824200 merged by jenkins-bot:

[operations/software/gerrit@deploy/wmf/stable-3.5] Gerrit v3.5.4 and rebuild plugins

https://gerrit.wikimedia.org/r/824200

I have scheduled the upgrade for tomorrow Thursday November 17th at 9:00 UTC which is just after the backport window. https://wikitech.wikimedia.org/wiki/Deployments#Thursday%2C_November_17

I have send the announcement to wikitech-l and ops mailling lists.

The update had some issues:

  • Due to the upgrade, all changes had to be reindexed. I have freaked out a bit cause we had 32 threads claiming to reindex all changes for mediawiki/core. But I think it is redhearing, the reindexing is sliced and each tasks probably end up with the same name. I have filed https://bugs.chromium.org/p/gerrit/issues/detail?id=16445 about it.
  • Various users reported not being able to send comments on change or voting. I think the reason is the / partition ended up being filled. I have clear a few file (homedir, obsolete kernels etc).

Some error happened in the gerrit_file_diff disk cache which caused a thread to held a lock in it.

$ ls -lhrS /var/lib/gerrit2/review_site/cache/gerrit_file_diff*
-rw-r--r-- 1 gerrit2 gerrit2   99 Nov 17 09:56 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.lock.db
-rw-r--r-- 1 gerrit2 gerrit2 8.8M Nov 17 09:52 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.trace.db
-rw-r--r-- 1 gerrit2 gerrit2  65M Nov 17 09:51 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.trace.db.old
-rw-r--r-- 1 gerrit2 gerrit2 8.0G Nov 17 10:11 /var/lib/gerrit2/review_site/cache/gerrit_file_diff.h2.db

Maybe that caused the / partition to be filed up. There was a temp 768MB temp file at some point.

I have started a full offline reindexing at 9:56 UTC.

The indexing completed at roughlly 11:05 UTC. That caused Gerrit to be automatically started either by Puppet or systemd. It had some more disk space issue.

SRE jumped in to relocate the Gerrit home dir to a different partition since the volume group has 80G.

Follow up in T323262

We are on Gerrit 3.5.4 there are a few UI changes which would probably trigger some discussions here and there but overall everything else is working as far as I can tell.

The disk space issue is tracked by T323262 and will be used for follow up actions.

Mentioned in SAL (#wikimedia-releng) [2022-11-18T10:05:13Z] <hashar> gerrit: change HEAD branch to point to deploy/wmf/stable-3.5 # T307334