Page MenuHomePhabricator

Investigate and document stacked merge requests
Open, MediumPublicFeature

Description

Some people rely on stacked patchsets for their workflow. This is the default model of Gerrit. Is there are recommended path to use this in Gitlab? Let's investigate and publish findings here and somewhere under https://www.mediawiki.org/wiki/GitLab/Workflows

Projects to checkout:

Event Timeline

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 3 2022, 7:43 AM
brennen edited projects, added GitLab (Integrations); removed GitLab.

Having skimmed the README, this doesn't look like it would specifically require any action on the GitLab administration side for contributors to use. I'm not sure how practical it really is, especially since I've never personally gotten in the habit of using git-review with Gerrit, but it seems like the sort of thing that could be usefully documented on-wiki if individual contributors / projects were interested in it.

demon subscribed.

This would involve us continuing to add Change-Ids onto commits even on Gitlab? No no, we don't want to recommend that. They're ugly and pointless.

I am worried that there seems to be a lack of interest in exploring options how to support the stacked patchsets workflow that Gerrit enables (if I am missing discussion or work around this, happy to be corrected). I raised this concern when the decision to move to GitLab was being made. I am worried that it will be a point of contention that either makes part of the developer population unhappy or possibly delays or blocks the migration.

There currently doesn't seem any central place to discuss this issue, but I'd like to point out that Git 2.38 comes with a new --update-refs feature (announcement), that sounds like it may be a potential solution or part of solution for this in git itself. But I haven't tested it myself yet, so this is just hopeful thinking from my side.

I am worried that there seems to be a lack of interest in exploring options how to support the stacked patchsets workflow that Gerrit enables (if I am missing discussion or work around this, happy to be corrected). I raised this concern when the decision to move to GitLab was being made. I am worried that it will be a point of contention that either makes part of the developer population unhappy or possibly delays or blocks the migration.

I had a long private conversation on that very topic a year or so ago. Essentially you can achieve more or less the same in Gitlab/GitHub as is done in Gerrit, with some caveats and limitations on both side though and an end result which might well be different. I will first describe how one can stack related patchsets together in Gerrit, how it can be done with GitLab/GitHub. I will conclude with some shortfall in both workflows and ironically not draw any conclusion.

The Gerrit use case for stacked patchsets is to express a dependency between two patchsets, it is very similar to the email workflow used to develop git or the linux kernel: the developer send a single email thread which has one email per patchset. Lets say you have two changes which is a pattern I often use:

  1. Add test coverage to an existing code
  2. Change some behavior and adjust the test written in previous patch

In your local git repository you have the commit chained as a6778b - test coverage59fd86 - change some behavior. When sending them to Gerrit it results in two changes: test coveragechange some behavior. The beauty is one can keep altering each of them independently and once ready that results in two merged commits which reflects what you had in your local git repository.

On GitHub/GitLab there are two possibilities:

The first GitHub/Gitlab possibility is to attempt to mimic what Gerrit is offering, send both patchsets in two different pull/merge requests. Something such as:

merge request 1: mastertest coverage
merge request 2: masterchange some behavior

Given the second relies on the first, the dependency is expressed by the tests failing which is not ideal. Or maybe if there is no test covering the dependency and one must express the dependency in the merge request text. Then there is nothing preventing the second merge request from entering unless a test explicitly breaks it to enforce the dependency.

A slight variation is to send the second merge request 2 with the commit from the first:

merge request 1: mastertest coverage
merge request 2: mastertest coveragechange some behavior

So you can review each independently, but the second would need to be rebased as the first merge request is adjusted. I have seen that used here and there and I don't think it is a good idea.

The reality is GitHub/Gitlab does not have a way to depend between requests, it is not the original model. Instead the model is based on a feature branch that has all the commits required to form it and you ask for the upstream maintainer to integrate ONE independent feature. The "pull request" comes directly from the Linux kernel jargon as I understand it, when you send an email threads to a Kernel maintainer and ask them to pull your code in their tree.

So the second possibility for GitHub/Gitlab, and the ones that must be used, is to have your dependent code in a single pull/merge request which has all the commits:

merge request: test coveragechange some behavior

Each commits can be reviewed independently in the merge request, exactly like they can be reviewed independently in Gerrit albeit that is done in different changes.

The major different is that with Gerrit the developer is required to have each commit to be pristine ready and working. One thus has to rebase their chain and amend the commits. In the end each commit is fine to enter the branch one at a time and can enter the tree at different time (in the example above, the first test coverage can be merged as soon as it is ready while the second can hold till it is ready).

In GitHub/GitLab the test is done against the feature branch / merge request. The independent commits inside it might well be broken as one is adding commit to the branch. One does not need to rebase which is possibly easier to understand. It is not unusual to have a merge request looking like:

test coveragechange some behaviorfix up testdebuglogfix styleadjust behavioradd doc

How do you merge that to avoid making noise in the main tree? GitHub lets the maintainer squash all the commits in a single one and you will end up with a single commit having both the test coverage and change some behavior. If it has to be reverted cause the new behavior is broken, you lost the test coverage. So instead a maintainer could request the developer to rewrite their merge requests and create two nice commits (and thuse use rebase amend like in Gerrit).

So I think you can achieve the stacked workflow by stacking them in a single merge request. But I don't think it is an improvement for review purposes since the review is about the whole chain rather than each individual commit. Experience from various maintainers on GitHub shows that either the tree ends up a mess ask, maintainers have to constantly ask to rework the merge requests to make them nicer or developers see their independent commits squashed by mistakes.

From what I gather, the merge request / pull request is deemed easier, notably because that is what most people are familiar with. Even thought it has shortfall compared to Gerrit, it should make it easier to contribute and initially (at least) provides a faster onboarding. Long term though, we are missing some capabilities and the best GitHub/GitLab has to offer is to chain your patchsets in a single pull/merge request.

The platform team had some concerns about the lack of stacked patchsets, see WORKFLOW at https://www.mediawiki.org/w/index.php?title=Topic:Vv0qfaeil219wyij&topic_showPostId=vv0qfaeil5zc52gr#flow-post-vv0qfaeil5zc52gr

So the second possibility for GitHub/Gitlab, and the ones that must be used, is to have your dependent code in a single pull/merge request which has all the commits:

merge request: test coveragechange some behavior

Each commits can be reviewed independently in the merge request, exactly like they can be reviewed independently in Gerrit albeit that is done in different changes.

This won't work for SRE workflows for configuration management where patches need to be stacked because they need to be applied in a sequence.

I don't see how the puppet repository could be reasonably be worked with without stacked patchsets.

From what I gather, the merge request / pull request is deemed easier, notably because that is what most people are familiar with. Even thought it has shortfall compared to Gerrit, it should make it easier to contribute and initially (at least) provides a faster onboarding. Long term though, we are missing some capabilities and the best GitHub/GitLab has to offer is to chain your patchsets in a single pull/merge request.

For me this is the crux of my discomfort with the forced (by management who are no longer here to debate on the topic) conversion from Gerrit to GitLab. The "people will not need to learn Gerrit" argument is true, but at the same time there are many valuable features present in Gerrit for which there are currently no known GitLab replacements. To me this is putting very, very short term gains over long term efficiencies. We are losing "stacked" changes that can easily be staged, reordered, and merged as needed. We are losing the ability for 2 or more people to easily collaborate on a single commit. We are losing cross-repository "depends-on" functionality to block out of order merges. We must reinvent a large amount of git related tooling to send notices on IRC, track things in Phabricator, allow folks to watch for changes by file path, and I'm sure many other things I'm forgetting. For this we gain a 5-20 hour reduction in the time engineers new to contributing to any Wikimedia project spend learning the basic workflows of Gerrit's patch based code reviews. I would have loved to have a reasoned debate about this in the past, but for whatever reasons (and they honestly don't matter now) this was not possible.

For me this is the crux of my discomfort with the forced (by management who are no longer here to debate on the topic) conversion from Gerrit to GitLab.

Please, please, please do not read this as me being upset with the people working on the GitLab project. I am upset with the decision making process that got us here.

thcipriani subscribed.

For me this is the crux of my discomfort with the forced (by management who are no longer here to debate on the topic) conversion from Gerrit to GitLab.

Please, please, please do not read this as me being upset with the people working on the GitLab project. I am upset with the decision making process that got us here.

<3


Replacing the nice workflows of Gerrit is something we understand we have to reckon with as we transition to GitLab.

Reopening this task for some investigation to see if it fits our needs (although if it requires keeping change-ids that would be a strange thing to keep forever).

This week we're working on pipeline things for GitLab (see Release-Engineering-Team (GitLab II: Wrath of Kahn 👾) for all the excitement). In coming sprints we'll spend time working with teams to migrate our existing heavily-gerrit workflows.

I'll try to prioritize this for sooner rather than later since it seems like there's a lot of (understandable) anxiety about it.

Having skimmed the README, this doesn't look like it would specifically require any action on the GitLab administration side for contributors to use. I'm not sure how practical it really is, especially since I've never personally gotten in the habit of using git-review with Gerrit, but it seems like the sort of thing that could be usefully documented on-wiki if individual contributors / projects were interested in it.

Going back to the original topic of this ticket, I want to emphasize what @brennen said here. The use of GerritLab is an individual contributor's decision whose only downside is the continued appearance of Change-Id's in commit messages created by those who choose to use the tool. I think we should be tolerant of this.

jbond renamed this task from Gerritlab to Gerritlab: Stacked pull requests.Dec 16 2022, 10:27 AM

leaving this here as it may also be useful https://olivernguyen.io/w/sapling/. disclaimer i have not tested myself

My team has adopted gitlab for a number of repositories already. There are weekly mentions to the stacked patchset workflow now.

I'm curious about the outcomes of the activities mentioned in T300819#8329682.

There is such a concept of stacked MRs in Gitlab: https://docs.gitlab.com/ee/user/project/merge_requests/#update-merge-requests-when-target-branch-merges. Unfortunately they only support cascading merge targets on up to 4 open MRs. For anything fancier, Gitlab apparently wants $$$$: https://docs.gitlab.com/ee/user/project/merge_requests/dependencies.html

thcipriani renamed this task from Gerritlab: Stacked pull requests to Investigate and document stacked merge requests.Jul 26 2023, 3:27 PM
thcipriani raised the priority of this task from Low to Medium.
thcipriani updated the task description. (Show Details)

This seems related to the general 'Depends-On' feature in gerrit/jenkins, which I think is also substantially missing/needs-hacking in gitlab? They are both expressing dependencies between changesets; the main difference with stacked changesets (other that the dependency is within a single repo, instead of between two repos) is that the latter changes are based on a different tree state, such that they would be ill-formed if you tried to rebase them to standalone on top of master.

A typical pattern I use when updating APIs in core is:

  1. Add a better way to do X in core: install of calling X, call Y!
  2. Update a dozen extensions to call Y instead of X. These patches all Depend-On the patch in step 1.
  3. Hard-deprecate X in core. This patch has a dozen Depends-On lines, listing all the production extensions patched in step 2. (Usually it also has a code search link, which lets a potential reviewer satisfy themself that the list of prodution dependencies is complete and that "only a reasonable number" of non-production extensions are left unpatched at the time of merge.)
  4. Remove X from core. (Usually with a note to say "don't merge this before XYZ date".)

The patches in steps 1, 3, and 4 are in the same repo and stacked. Patch 4 also is inherently based on 3; it couldn't be rebased on master without causing a merge issue.

Note also that the reviewers for step 2 are a motley and eclectic bunch from across the foundation and community, and are not necessarily related to the reviewers for step 1/3/4.

So.... at the risk of slightly broadening this task, I'd like to see the tasks for inter-repo dependencies (step 1 and 2 above; and step 2 and 3 above) at least discussed simultaneously with the task for stacked dependencies (step 1 -> 3 -> 4 above), so that the basic workflow is preserved and we don't lose sight of the forest for the trees. Already it's a big impedance mismatch when some of the extensions in step 2 are not hosted in gerrit, and that causes pain and delay.

This seems related to the general 'Depends-On' feature in gerrit/jenkins, which I think is also substantially missing/needs-hacking in gitlab?

I agree "Depends-On" is an important feature for exactly the workflow you describe!

And I'd like to exclude it from this task since it's not actually part of the patch stacking features of Gerrit. It's feature of the glue between gerrit and jenkins: zuul. The gerrit zuul plugin is what's adding the UI in gerrit.

It's very important to our workflows. But let's decouple it from stacking patches within a repository.


Stacked merge request tool

tl;dr: Release-Engineering-Team have been contributing upstream to GerritLab and it seems to be a good tool for handling stacked patches. I'd be happy for folks to try it out.

GerritLab

The good

  • Less UI fiddling – it creates branches, branch names, and manages merge requests (and their dependency chains) for you
  • Re-targets MRs after requests merge – GitLab (unlike GitHub, seemingly) re-targets dependent merge requests after dependencies merge. That is, if you merge the first MR in a stack to main, the next MR in the stack will now target main (this is a GitLab feature, not a GerritLab feature)
  • Working with upstream – We've landed a bunch of changes upstream and can make more tweaks. The code is small and we can tweak it to fit our needs.
  • Adds UI to your merge request so that folks know it is a patch:

2023-09-29_gitlab-mr-chain.png (259×708 px, 22 KB)

Less good

  • It's slower than git push origin HEAD:refs/for/main. We've made it faster. It takes a second or two. It will likely never be as fast because what was done server side is done client side.
  • It's a tool you have to install. The focus of the project is to be the equivalent of git review for Gerrit. But, unlike Gerrit, there is no pure git workflow.

Investigation background/How we landed on GerritLab

Both GitHub and GitLab support stacked patches/diffs (as described by Michaela Greiler). But on GitHub they're unusable. On GitLab, they're merely difficult to use.

On GitLab, you can create a merge request which depends on another merge request.

And it has some nice features:

  • Like Gerrit, you can't merge patches out of order (You can click "Merge" on a patch in the middle of the stack, but that will merge the patch down the stack. This is different than Gerrit's behavior "Unable to merge")
  • Like Gerrit, if you change a merge request in the middle of the stack, GitLab notices a conflict (unlike GitHub) and lets you fix it in the browser. The fix cascades to all subsequent Merge Requests, no need to fix the whole stack (again, unlike GitHub).
  • Like Gerrit, once you merge the bottom of your patch stack, the next merge request up the stack automatically re-targets the next merge request down the stack (again, unlike GitHub :))

But creating merge-request stacks manually is hard:

  • You have to think of branch names for each patch in the stack and push each one up to GitLab
  • You have to generate a merge request for each patch in the stack while thinking really hard about those branch names

Because GitHub makes this so hard, a lot of the tooling we found was GitHub-specific:

Or they want to take over your entire git workflow:

Or are they're startups trying to solve this problem (graphite.dev which is also focused on GitHub :)).

GerritLab is focused on the GitLab workflow and it makes many of the headaches easier.

GerritLab is great. It's hard to describe how relieved I feel to have stacked patches available to me in GitLab.

I'd love to see us promote this more widely, especially in https://www.mediawiki.org/wiki/GitLab/Workflows#Making_a_GitLab_Merge_Request, perhaps as the default best practice for projects.

This seems related to the general 'Depends-On' feature in gerrit/jenkins, which I think is also substantially missing/needs-hacking in gitlab?

I agree "Depends-On" is an important feature for exactly the workflow you describe!

And I'd like to exclude it from this task since it's not actually part of the patch stacking features of Gerrit. It's feature of the glue between gerrit and jenkins: zuul. The gerrit zuul plugin is what's adding the UI in gerrit.

It's very important to our workflows. But let's decouple it from stacking patches within a repository.

I created T349872: Investigate and document "Depends-On" in GitLab for this.

One thing that gerritlab doesn't solve is the ability to +2 a patch without it merging into the next one in the stack. e.g. in gerrit, one can +2 a patch, but it is more or less "frozen" until the parent patch is also +2'ed. That means that if a reviewer isn't careful, they could accidentally merge patches in a stack, making the review of the base patch more cumbersome.

The only workaround I can think of is using "Draft" status in the stacked patches, which will prevent them from being merged.