Page MenuHomePhabricator

Upgrade gerrit to 2.12
Closed, ResolvedPublic

Tokens
"Like" token, awarded by Aklapper."Yellow Medal" token, awarded by matmarex."Barnstar" token, awarded by Jdforrester-WMF."Like" token, awarded by adrianheine."Like" token, awarded by jayvdb."Love" token, awarded by Florian."Heartbreak" token, awarded by mmodell.
Assigned To
Authored By
Paladox, Jul 19 2014

Description

Please upgrade gerrit to 2.12.

Following plugins needs to be tested and fixed in the new version (and blocks an upgrade):

its-phabricator	2.8.1	Enabled
its-phabricator-from-bugzilla	2.8.1	Enabled
github-create-1.0-rc0	1.0-rc0	Enabled
deleteproject	2.8.1	Enabled

Release notes for 2.12 are at https://gerrit-documentation.storage.googleapis.com/ReleaseNotes/ReleaseNotes-2.12.html


See Also: T65847

Details

Reference
bz68271

Related Objects

StatusAssignedTask
Resolveddemon
ResolvedNemo_bis
Resolveddemon
Resolveddemon
Resolveddemon
ResolvedNone
Resolveddemon
Resolveddemon
Resolveddemon
ResolvedAklapper
Resolveddemon
Resolveddemon
Resolveddemon
Resolveddemon
Resolveddemon
Resolveddemon
DeclinedBawolff
Resolvedjcrespo
Resolveddemon
ResolvedDzahn
ResolvedRobH
ResolvedCmjohnson
Resolveddemon
ResolvedCmjohnson

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
demon renamed this task from Upgrade gerrit to 2.11 to Upgrade gerrit to 2.12.Dec 3 2015, 5:22 PM

Although Phabricator is the glorious future, we should upgrade Gerrit at least for stability + security improvements if nothing else. 2.12 is basically ready to go, testing it now.

Yes and plus makes it really easy to edit in browser. Some users like using the gerrit tool provided by Wikimedia to upload there patches with this update they can do it from gerrit supper easy.

hashar added a comment.Dec 3 2015, 6:58 PM

OpenStack is upgrading to 2.11 on Dec 16th (from 2.8.4+). They already wanted to upgrade a couple weeks ago but cancelled it, maybe because it was not working properly.

No idea whether 2.12 will work.

Upgrade of Gerrit usually have side effect with Zuul.

Maybe create a test site with gerrit and test weather upgrading gerrit will cause zuul and jenkins too break.

Paladox updated the task description. (Show Details)Dec 4 2015, 9:12 AM

Gerrit 2.12 includes support for signed pushes now which some users on Wikipedia use.

Release notes at https://gerrit-documentation.storage.googleapis.com/ReleaseNotes/ReleaseNotes-2.12.html

Paladox updated the task description. (Show Details)Dec 27 2015, 5:53 PM

Is there any way to find out if this will or won't be scheduled?

I believe that they have a setup a new server and that will use gerrit 2.12 they will be switching soon I think. I may be wrong.

demon added a comment.Mar 10 2016, 3:42 PM

Is there any way to find out if this will or won't be scheduled?

When I have a firm date/time for switching to the new machine & version I'll let everyone know. Right now I'm getting the box prepped and testing the new version.

There's polygerrit, gerrit said next version but built into gerrit to test at moment. Once updated to gerrit 2.12 add this to your url ?polygerrit=1 the new design is really nice.

I don't think, that such a hidden feature, especially because it isn't mentioned in the changelog and only available by adding a query to the url, is something which could be named as a "near" feature we've after upgrading :)

@demon Hi, any news here? :) Is it possible to have a look at a test instance running the newer Gerrit version?

Ltrlg removed a subscriber: Ltrlg.May 17 2016, 8:25 PM

Another feature/bug in 2.11 (and with it in 2.12, too) would be a fix for:
https://bugs.chromium.org/p/gerrit/issues/detail?id=1207

Yes... on a german keyboard you can't use the shortkeys for next/previous file in the diff view, which isn't really great :(

demon raised the priority of this task from Lowest to High.Jul 8 2016, 2:35 PM
demon added subscribers: jcrespo, Dzahn, greg.EditedJul 8 2016, 9:40 PM

Tested the upgrade path today using copies of production data (Sql and Git) and the 2.12.2 package we built for Jessie. It went well. Couple of more tests this afternoon and then it looks like we can run the upgrade ASAP. Workflow will be:

  1. Decide/announce a window
  2. Stop puppet on ytterbium, stop Gerrit processes
  3. Do a last snapshot of the Gerrit database, in case we have to roll back
  4. Apply puppet manifest to new server lead
  5. Run puppet on lead, which will issue the schema upgrades to m2-master (will take about 4-5 minutes from testing)
  6. Stop puppet on lead until (7) is done to keep Gerrit from being brought up just yet
  7. Run the reindexing process on new lead, will take about 45-60m from testing.
  8. Re-enable puppet so systemd can take over running gerrit
  9. Verify installation
  10. Point DNS from ytterbium to lead

The whole process, ideally, should only take about an hour to an hour and a half based on my testing. I want to give myself 3 hours in case. This is highly disruptive so needs to be announced widely, but soon (as in days, not weeks). I'm almost thinking this weekend since less disruption than on a workday...

I will need a root user on hand to take the snapshot and merge puppet changes (which can all be staged beforehand). @Dzahn has been super helpful so far, but I know @jcrespo is the DBA.

greg added a comment.Jul 8 2016, 10:30 PM

Re timing:
I guess email Ops list with the plan, ask for a volunteer for Sunday (offer $drinks), if someone replies before tomorrow (Saturday) when Chad wake up then he should send out the announcement that we'll do the Sunday maintenance window.

The upgrade of Gerrit definitely has an impact on Zuul.

The version we run is based on upstream 66c8e52 which is:

  • from early December 2015 (7 months ago)
  • when they were still running Gerrit 2.8.4
  • before a massive refactoring of how Zuul connects to Gerrit

I tried to upgrade Zuul further but a mysteriously issue happened related to the Zuul / Gerrit ssh connection: T137525 . I haven't managed to reproduce it locally or on labs though :-(

OpenStack has (since December) switched to Gerrit 2.11.x + patches and I haven't looked at Zuul code whether there are changes that are needed. They will most probably be based on massive refactoring that we still do not have in our version.

The, maybe the Zuul version will just work with Gerrit 2.12. If stream-events hasn't changed and the review command is still compatible, we might be just fine.


As part of the migration we also have to:

  • update various Zuul related bits that points to ytterbium
  • most probably manually change the remote URL of zuul-merger repos on scandium in /srv/ssd/zuul/git

Also I think the old change screen is gone and that is going to cause major havoc and complains for sure. So wanna communicate about that user facing change ahead of time.

Also the zuul seems to work perfectly with gerrit 2.12 (no lost in connection or ssh connection) It seems to me that may have been a bug in gerrit 2.8.

Dzahn added a comment.Jul 11 2016, 8:49 PM

Paladox spoke to hashar after the comment above and the user does not have to be created (?)

Change 298041 had a related patch set uploaded (by Dzahn):
gerrit: make Apache config compatible with 2.4

https://gerrit.wikimedia.org/r/298041

Change 298041 merged by Dzahn:
gerrit: make Apache config compatible with 2.4

https://gerrit.wikimedia.org/r/298041

demon claimed this task.Jul 12 2016, 8:15 PM

I've been messing around with gerrit-new. On the projects list, a lot appear to be duplicated under the 'git/' prefix?

I don't think the VE backport submodule update issue has been fixed :/
I made https://gerrit-new.wikimedia.org/r/#/c/298010/ but it doesn't look like the submodule update showed up against mediawiki/core.git's wmf/1.28.0-wmf.1 branch

demon added a comment.Jul 13 2016, 5:23 AM

I don't think the VE backport submodule update issue has been fixed :/
I made https://gerrit-new.wikimedia.org/r/#/c/298010/ but it doesn't look like the submodule update showed up against mediawiki/core.git's wmf/1.28.0-wmf.1 branch

It probably will require a manual fix in the database where the old registration took place. Check with gsql?

demon added a comment.Jul 13 2016, 5:26 AM

I've been messing around with gerrit-new. On the projects list, a lot appear to be duplicated under the 'git/' prefix?

Looks like a mistake on-disk from my part from when we were doing replication of repositories via rsync. Deleted the git/* ones.

@demon hi, I noticed that it dosent show the links to phabricator from the changes or from project pages.

And it seems zuul is not testing in gerrit-new.

And it the performance has really increased, loads instantly for me on a 80mbps connection from BT.

The old gerrit version only showed the "Rebase" button if the patch was in need of the rebase, while on gerrit-new the option is always displayed. If that's configurable it would be nice to revert to the old behaviour (since it makes it more obvious whether a patch needs to be rebased before merging)

I don't think the VE backport submodule update issue has been fixed :/
I made https://gerrit-new.wikimedia.org/r/#/c/298010/ but it doesn't look like the submodule update showed up against mediawiki/core.git's wmf/1.28.0-wmf.1 branch

It probably will require a manual fix in the database where the old registration took place. Check with gsql?

gerrit> select * from submodule_subscriptions where submodule_path like '%VisualEditor%' and submodule_branch_name = 'refs/heads/wmf/1.28.0-wmf.1';
 submodule_project_name | submodule_branch_name       | super_project_project_name | super_project_branch_name   | submodule_path
 -----------------------+-----------------------------+----------------------------+-----------------------------+------------------------
 VisualEditor           | refs/heads/wmf/1.28.0-wmf.1 | mediawiki/core             | refs/heads/wmf/1.28.0-wmf.1 | extensions/VisualEditor
(1 row; 18 ms)

@MoritzMuehlenhoff Gerrit should show a little orange bubble near the "Parent(s)" section of the commit information, which indicates, that the parent isn't the latest one (which normally allows you to rebase). See this example:

Merge conflicts (the current "Can merge: No" section), btw., is now a string with a red font in the change information box:

demon added a comment.EditedJul 20 2016, 10:38 PM

Adjusting my upgrade list from above, minor tweaks only:

  1. Decide/announce a window (ASAP, no time is good though)
  2. Stop puppet on lead
  3. Apply puppet / dns changes for lead and CI (298673, 299007)
  4. Run puppet one last time on ytterbium, stop puppet, stop Gerrit processes
  5. Do a last snapshot of the Gerrit database, in case we have to roll back
  6. Enable & run puppet on lead, which will issue the schema upgrades to m2-master (will take about 4-5 minutes from testing), Gerrit will restart)
  7. Run puppet for zuul nodes (restart zuul w/ new config)
  8. Verify installation
  9. Point DNS from ytterbium to lead

The big points is we can leave puppet on lead the whole time and don't need downtime for reindexing. We did most of the prep work so Gerrit 2.12.2 is already running on that host. The index will be slightly out of date, but it can be rebuilt in-place. Searches will be a little wonky for the 45m or so, but it's better than staying down. We'll need a zuul upgrade, but this should be mostly prepared and in place prior to doing it. @hashar: How close are we to having the new zuul ready? It should be mostly done right?

The whole process, ideally, should only take about 45 minutes tops, thanks to the extensive testing we did prior. Looking at the ganglia data, it looks like our lowest traffic times on weekdays are from about 01:00 to 10:00 UTC. Saturday and Sundays are mostly quiet other than the l10n bot updates at around 00:00UTC. So quietest time from my gathering is basically 01:00 to about 10:00 on Mondays, late Sunday night in the US. Basically let l10n bot do its thing and then start the window. The nice thing, in case things go totally wrong the European ops will be awake soon on a workday to help and take over.

But it shouldn't go totally wrong. Roll back plan if things go wrong (and I plan to be very aggressive in rolling back if we don't have full functionality back) is as follows:

  1. Stop puppet on lead, turn off gerrit
  2. Restore database to snapshat that we did prior to downtime
  3. Revert puppet/DNS change from above ^^
  4. Restart puppet and gerrit on ytterbium and lead
  5. Reassess and plan a new window after issues fixed

If this sounds good, I'd like to shoot for this coming Monday, July the 25th for our downtime (Sunday evening my time). I'm thinking 01:00-04:00UTC, that's 6-9pm locally.

Sounds good to me. I can be there Sunday night Pacific time.

@demon Sounds sane! :) Thanks for your work and the work of any other one involved so far :)

We would want to keep the ssh host fingerprint for the Gerrit embedded SSH daemon. If it changes bunch of bots will stop until manually accepted.

1am-4am UTC turns to 3am-7am CET, I can't reasonably work from home at those hours. not much opportunity until 6:30am UTC at best.

Then it is very low traffic at those hours, so I guess we can afford to have CI potentially broken somehow for a few hours.

demon added a comment.Jul 21 2016, 2:00 PM

We would want to keep the ssh host fingerprint for the Gerrit embedded SSH daemon. If it changes bunch of bots will stop until manually accepted.

This has been done. I'll get it done in puppet too so we don't miss it next time.

1am-4am UTC turns to 3am-7am CET, I can't reasonably work from home at those hours. not much opportunity until 6:30am UTC at best.
Then it is very low traffic at those hours, so I guess we can afford to have CI potentially broken somehow for a few hours.

Yeah I know these hours weren't ideal for you, but I'm pretty sure from our testing that things will be ok. And "broken CI" is a condition for rolling things back.

Yeah I know these hours weren't ideal for you, but I'm pretty sure from our testing that things will be ok. And "broken CI" is a condition for rolling things back.

There is never a good time. I should be to make a personal arrangement to show up without waking up the whole house!

Change 300657 had a related patch set uploaded (by Dzahn):
gerrit: lower TTLs to 600

https://gerrit.wikimedia.org/r/300657

Change 300657 merged by Dzahn:
gerrit: lower TTLs to 600

https://gerrit.wikimedia.org/r/300657

Change 300660 had a related patch set uploaded (by Dzahn):
switch gerrit-new to gerrit

https://gerrit.wikimedia.org/r/300660

Change 300660 abandoned by Dzahn:
switch gerrit-new to gerrit

Reason:
duplicate of https://gerrit.wikimedia.org/r/#/c/299007/

https://gerrit.wikimedia.org/r/300660

demon closed this task as Resolved.Jul 25 2016, 6:33 AM
Dzahn added a comment.EditedJul 25 2016, 6:34 AM
2016-07-25

    03:26 hashar: scandium: migrating zuul-merger repos from lead to gerrit.wikimedia.org: find /srv/ssd/zuul/git -path '*/.git/config' -print -execdir sed -i -e 's/lead.wikimedia.org/gerrit.wikimedia.org/' config \;
    02:03 ostriches: gerrit: reindexing lucene now that we have new data. searches/dashboards may look a tad weird for a bit
    01:53 hashar: starting Zuul
    01:51 mutante: restarted grrrit-wm
    01:39 ostriches: lead: turning puppet back on, here we go
    01:38 jynus: m2 replication on db2011 stopped, master binlog pos: db1020-bin.000968:1013334195
    01:37 hashar: scandium: restarted zuul-merger
    01:36 ostriches: ytterbium: Stopped puppet, stopped gerrit process.
    01:34 mutante: switched gerrit-new to gerrit in DNS
    01:30 ostriches: lead: stopped puppet for a few minutes
    01:17 hashar: scandium: migrating zuul-merger repos to lead find /srv/ssd/zuul/git -path '*/.git/config' -print -execdir sed -i -e 's/ytterbium.wikimedia.org/lead.wikimedia.org/' config \;
    01:10 hashar: stopping CI
    01:09 jynus: reviewdb backup finished, available on db1020:/srv/tmp/2016-07-25_00-54-31/
    01:02 ostriches: rsyncing latest git data from ytterbium to lead
    00:57 mutante: manually deleted reviewer-counts cron from gerrit2 user, runs as root and puppet does not remove crons unless ensure=>absent
    00:55 jynus: starting hot backup of db1020's reviewdb

Change 300815 had a related patch set uploaded (by Chad):
TESTING STUFF

https://gerrit.wikimedia.org/r/300815

Change 300815 abandoned by Chad:
TESTING STUFF

Reason:
Yay stuff works now.

https://gerrit.wikimedia.org/r/300815