Page MenuHomePhabricator

Define Cut-off date for abandoning old changesets in our code review tool
Closed, DeclinedPublic

Assigned To
None
Authored By
Qgil
Sep 25 2015, 9:20 AM
Referenced Files
None
Tokens
"Dislike" token, awarded by Ricordisamoa."Dislike" token, awarded by JanZerebecki."Dislike" token, awarded by jayvdb.

Description

We should define some sort of cut off for just abandoning commits (over some date, even more so if CR-1/CR-2 or V-1). They're just clutter. They still exist, and if someone wanted to work on them, they can be easily restored.

Probably with a nice summary, "thanks for the commit, you can restore this at a later date if you plan on working on it" etc

For example mw core has 29 commits untouched since 2013...

You could argue that untouched patches over a year old are candidates for abandoning etc... Core has around 100 commits that are a year old

This is a standard practice in some open source projects, and I think it would be a good idea for us as well. We can start off incredibly generous, only abandoning patches that havn't been touched in a year. But other projects (such as HHVM) will close open patch sets with three weeks of inactivity. It's not like the patches are lost, abandon just means its not going anywhere. Anyone who wants to can restore the patch set.

See past discussions: http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60150/ , http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/65931/ , http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/76459/ , http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/76640/

Principles

Let's agree on the principles, as a required step to agree on the implementation.

  1. If a patch is ready for code review, it should be reviewed.
    1. When a patch has zero reviews, the burden is on the reviewers, no matter how old the patch.
      1. If the problem is that the repository has been unmaintained for a long time, then the patch submitter could be invited to maintain the repo.
        1. If the search for new maintainers is futile, then the repo should be marked as inactive and the pending patches marked as abandoned.
    2. When the patch has +1 or equivalent, the burden is on the reviewers, no matter how old the patch. These patches should get priority in the queue.
    3. When the patch has -1 or equivalent, the burden is on the authors.
      1. If the patch doesn't have new activity, after a prudential period (1 month?) a bot should post a reminder.
        1. If after the automatic reminder there is also inactivity, after a prudential period (2 months?), anybody is entitled to mark the patch as abandoned.
      2. When a patch had negative reviews in the comments not reflected with a -1, anybody is entitled to mark a -1 on behalf of these comments.
      3. When a patch has no reviews or +1 and is still waiting for review, it is not cool to force a -1 by throwing it to the Continuous Integration pipeline again, without specific negative feedback about the patch itself
        1. If the patch looks good but needs rebase, it is cool to leave the good feedback in the comments and force a rebase, even if that results in a -1.
  2. If a patch is not ready for code review, it should be marked with [WIP] in the subject.

The concepts of +1 / -1 are specific to Gerrit, but if these principles make sense, we surely can adapt or evolve them for Differential.

See Also:
T129068: Improve code contribution guidelines for patch authors
T78639: How to address the long tail of low priority tasks in active projects (abandoning a rotting changeset = declining a rotting task?)

Related Objects

StatusSubtypeAssignedTask
ResolvedQgil
ResolvedDicortazar
InvalidNone
InvalidNone
ResolvedQgil
DeclinedNone
OpenNone
ResolvedQgil
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedDicortazar
ResolvedDicortazar
ResolvedAcs
ResolvedDicortazar
ResolvedDicortazar
ResolvedDicortazar
ResolvedDicortazar
ResolvedDicortazar
ResolvedDicortazar
InvalidDicortazar
ResolvedDicortazar
ResolvedDicortazar
ResolvedAklapper
ResolvedDicortazar
ResolvedDicortazar
DuplicateNone
ResolvedDicortazar
ResolvedBawolff
Resolved mmodell
ResolvedNone
Resolved mmodell
ResolvedLegoktm
Resolvedtstarling
Resolvedgreg
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedAklapper
ResolvedNone
DeclinedNone

Event Timeline

Qgil raised the priority of this task from to Medium.
Qgil updated the task description. (Show Details)
Qgil added subscribers: Jay8g, mmodell, Tgr and 19 others.
Aklapper renamed this task from Cut off date for abandoning old commits to Define Cut-off date for abandoning old changesets in our code review tool.Sep 25 2015, 9:29 AM
Aklapper set Security to None.

Simply closing a review request that was never responded to since time x seems like the opposite of what we want. Or if it were it would be easier to not allow them to submit such patches in the first place.

Abandoning a patch that only ever saw a "-1 needs manual rebase" seems like a waste.

A patch that needs non-cosmetic changes that would take a knowledgeable person with merge rights more than 15 minutes (not only a rebase or style) to be acceptable, which is linked in a task, where the owner didn't have any actions on gerrit or phab since 1 year, seems like it can be abandoned as it can be found through Phabricator and restored.

However a patch whose goal is not acceptable and has a reasonable -2 with multiple people reasonably agreeing and no response from the owner seems also like something that can be abandoned.

Nothing new happened in the past years, hence see comments on the past threads.

Is there a specific goal this is supposed to achieve, or just a general sentiment that the room looks nicer when old stuff is swept under the rug? Gerrit allows filtering by age, so if someone does not like to see old patches, personal/team dashboards are easy to adapt.

Is there a specific goal this is supposed to achieve, or just a general sentiment that the room looks nicer when old stuff is swept under the rug? Gerrit allows filtering by age, so if someone does not like to see old patches, personal/team dashboards are easy to adapt.

Conversely, what does keeping them around (non abandoned) actually do?

Conversely, what does keeping them around (non abandoned) actually do?

Motivate people (even if just a little bit) to finish their stuff?

This seems similar to the periodically occurring discussions about desysopping inactive editors on project X. It probably does not make much of a difference either way, but it seems difficult to argue that that small difference would be positive; a significant fraction of the people who are actually affected seem annoyed by it.

'Abandoning changesets' in our current code review tool configuration means dumping a changeset into an unfilterable pile of crap.

I have lots of very old stale changesets which are not ready (other changes need to be done first) but I do intend to finish, and the only limiting factor is my time to complete them. I also have quite a few changesets which are abandoned because they will never be needed, for various reasons. I dont want to abandon my inactive changesets, because then it is harder to find them again. I guess I could abandon+star , so it is easy for me to find them.

However they are useful changes and other people should be able to find them and complete them. e.g. I get hit by a bus, I expect most of them would be polished and merged fairly easily, with only a small fraction being too incomplete, relying on knowledge in my head, and so would be abandoned.
In pywikibot, when someone abandons a useful changeset, we usually pick it up, polish & merge it for them quickly, otherwise it is 'lost'.

It would be more practical to abandon changes if we could add tags to the changesets to be abandoned and those tags could be easily used in searches. e.g. some way of annotating "useful but incomplete fix; needs rescuing".

...
A patch that needs non-cosmetic changes that would take a knowledgeable person with merge rights more than 15 minutes (not only a rebase or style) to be acceptable, which is linked in a task, where the owner didn't have any actions on gerrit or phab since 1 year, seems like it can be abandoned as it can be found through Phabricator and restored.

Perhaps it would be worth determining the percentage of changesets that do not have a linked Phabricator task.

I wonder whose decision is this, ultimately. Not Developer Relations. TechCom ?

From the few comments on this task, seems we want to keep changes open bit rotting until they get manually abandoned or worked again until they pass.

A solution is to babysit old patches and have them finished/abandoned, which is itself asking "how do we review the huge code review backlog". My stance is that we should automatically abandon changes that have nobody actively working on them after X few weeks.

From the few comments on this task, seems we want to keep changes open bit rotting until they get manually abandoned or worked again until they pass.

A solution is to babysit old patches and have them finished/abandoned, which is itself asking "how do we review the huge code review backlog". My stance is that we should automatically abandon changes that have nobody actively working on them after X few weeks.

+1000
I keep saying this. I firmly believe from what I've seen with how people use gerrit that code doesnt get reviewed because of stale patchsets.

Many projects auto abandon and there is nothing wrong with us doing so too. Patches can always be restored where needed (and if nothing else this motivates the owner to fix them up) If a patchset is open in my opinion that is a commitment to see that patch through to completion.

I would strongly urge that we trial this and measure impact on code review. Do more patches get reviewed after a mass purge compared to previously?

I am currently investigating generating stats to socialise why this is a problem for our project...

(I should add that there is a difference between a patchset being open 1000 days and rebased consistently every week and a patchset written 1000 days ago and not touched since.) The latter should be abandoned not the former.

No I strongly disagree. A patch that has no review should not be abandoned. Also we don't want to encourage clicking the rebase button or doing another unnecessary action. If you constantly click the rebase button on your patch you are wasting my attention bandwidth, which I could better use to review patches instead. See T113706#1677005 and T113706#1674902.

[...] which is itself asking "how do we review the huge code review backlog".

Exactly, train more reviewers you can trust. Unless your intention is to drive away contributors, that is the only way. AFAIK the backlog is growing which means we don't even manage to keep up with new patches, so the problem is not a backlog in itself.

If a patch needs to be abandoned it should be done by a human. Unless it is spam. We want to be respectful to our contributors.

Declining because there is no consensus for this.
See https://www.mediawiki.org/wiki/User:AKlapper_%28WMF%29/Code_Review for many good ideas instead.

Reopening as I think it was a premature close. @Jdlrobson just said he was working on more information.

As @jayvdb, I also have some old changesets whose ETA → ∞, but I would only abandon them if they are unlikely to be ever needed in the future. Most of them I think are uncontroversial, or at least should be in their final form, however some patches may be actively pursued by their authors while being objected to by senior developers.

Let's pretend for a moment that the backlog is a huge gold mine, not a huge pile of dirt...

Conversely, what does keeping them around (non abandoned) actually do?

Motivate people (even if just a little bit) to finish their stuff?

If your patchset got abandoned would that not motivate you/remind you even more if it was important?

'Abandoning changesets' in our current code review tool configuration means dumping a changeset into an unfilterable pile of crap.

Not true. https://gerrit.wikimedia.org/r/#/q/status:abandoned+project:mediawiki/core+owner:Jdlrobson,n,z
https://gerrit.wikimedia.org/r/#/q/is:draft,n,z
Also we have Phabricator.

However they are useful changes and other people should be able to find them and complete them. e.g. I get hit by a bus, I expect most of them would be polished and merged fairly easily, with only a small fraction being too incomplete, relying on knowledge in my head, and so would be abandoned.

Useful changes will have a phabricator task associated with them. They will thus be easy to find and completed. Even more so now they have not been cookie licked :) If a patchset is open it suggests the author is still working on it (even if untouched in years!)

In pywikibot, when someone abandons a useful changeset, we usually pick it up, polish & merge it for them quickly, otherwise it is 'lost'.

That sounds like a good thing that we should encourage. IF a patch has been sitting there for a year it seems to me that that's just as abandoned as a patch that someone hit the abandon button for.

It would be more practical to abandon changes if we could add tags to the changesets to be abandoned and those tags could be easily used in searches. e.g. some way of annotating "useful but incomplete fix; needs rescuing".

This is what Phabricator tasks are for. We should not be relying on a code review tool to find patches. That is not what is was designed for. It's designed for reviewing code. Phabricator on the other hand is designed for finding problems/solutions.

...
A patch that needs non-cosmetic changes that would take a knowledgeable person with merge rights more than 15 minutes (not only a rebase or style) to be acceptable, which is linked in a task, where the owner didn't have any actions on gerrit or phab since 1 year, seems like it can be abandoned as it can be found through Phabricator and restored.

Perhaps it would be worth determining the percentage of changesets that do not have a linked Phabricator task.

I'll look into this.

No I strongly disagree. A patch that has no review should not be abandoned.

Agreed. I don't see how this is a disagreement.

Also we don't want to encourage clicking the rebase button or doing another unnecessary action. If you constantly click the rebase button on your patch you are wasting my attention bandwidth, which I could better use to review patches instead. See T113706#1677005 and T113706#1674902.

I used this as an extreme example. I doubt this happens/is possible in practice, but if it does it suggests someone is failing to get the reviews they want and that we are failing them. IT would be great to use bandwidth to review the patch of this person who is keen to participate. Rebasing doesn't mean hitting a button. That never works in practice, it requires manually committing new versions.

If a patch needs to be abandoned it should be done by a human. Unless it is spam. We want to be respectful to our contributors.

I didn't suggest this would be done by computers. Such a policy could easily be enforced by humans.

Declining because there is no consensus for this.

I'm seeing 3 votes against (jayvdb, you and possibly tgr)
4 votes for hashar reedy EBernhardson and myself
That's a small sample and we're never going to get unanimous consensus.

I don't understand why we wouldn't try this. I'd happily trial it myself, manually abandon them all and take the blame for any fallout but at least we'd learn something (impact on number of patches merged/review speed/number of complaints). We could always restore them at the end after measuring the change if any... Before then I really need to get the current situation visualised. That's on me.

Let's pretend for a moment that the backlog is a huge gold mine, not a huge pile of dirt...

And let's pretend that phabricator is a storage space inside the gold mine where dirt/gold can be categorise and techniques for retrieval can be discussed. Right now we can't get into the gold mine to do digging as the door is blocked and it's not well organised! :)

Every patch must be dealt with carefully by a human. Just look at https://github.com/twbs-closer or https://github.com/twbs-rorschach...!!!

I'd happily trial it myself, manually abandon them all and take the blame for any fallout but at least we'd learn something (impact on number of patches merged/review speed/number of complaints).

Please don't use contributors as a test platform.

Let's pretend for a moment that the backlog is a huge gold mine, not a huge pile of dirt...

And let's pretend that phabricator is a storage space inside the gold mine where dirt/gold can be categorise and techniques for retrieval can be discussed. Right now we can't get into the gold mine to do digging as the door is blocked and it's not well organised! :)

Yes, newbies should be guided through Maniphest. "We'll take you to the point where you can finish the patch yourself" over "Hand it to us, we'll take care of it".

The basic principles, imho:

  1. If a patch is ready for code review, it should be reviewed.
    1. When a patch has zero reviews, the burden is on the reviewers, no matter how old the patch.
      1. If the problem is that the repository has been unmaintained for a long time, then the patch submitter could be invited to maintain the repo.
        1. If the search for new maintainers is futile, then the repo should be marked as inactive and the pending patches marked as abandoned.
    2. When the patch has +1 or equivalent, the burden is on the reviewers, no matter how old the patch. These patches should get priority in the queue.
    3. When the patch has -1 or equivalent, the burden is on the authors.
      1. If the patch doesn't have new activity, after a prudential period (1 month?) a bot should post a reminder.
        1. If after the automatic reminder there is also inactivity, after a prudential period (2 months?), anybody is entitled to mark the patch as abandoned.
      2. When a patch had negative reviews in the comments not reflected with a -1, anybody is entitled to mark a -1 on behalf of these comments.
      3. When a patch has no reviews or +1 and is still waiting for review, it is not cool to force a -1 by throwing it to the Continuous Integration pipeline again, without specific negative feedback about the patch itself
        1. If the patch looks good but needs rebase, it is cool to leave the good feedback in the comments and force a rebase, even if that results in a -1.
  2. If a patch is not ready for code review, it should be marked with [WIP] in the subject.

I'm not sure how +1 / -1 would be interpreted in Differential (T138), but if you think these principles make sense, we surely can adapt or evolve them to fit the new code review tool.

When the patch has -1 or equivalent, the burden is on the authors.

The problem with that is that Gerrit gives -1 to patches for largely irrelevant technical reasons such as inability to merge without a rebase. Leaving a patch to rot for a year and then demanding that the owner rebase it without guaranteeing that it won't rot for another year afterwards seems unhelpful.

If your patchset got abandoned would that not motivate you/remind you even more if it was important?

If I were a newish contributor, it would probably annoy me and make me look for a community that's more open to contributions. E.g. some orgs autoclose bug reports after a given time, and if that happens without anyone ever looking at it, I do not find that motivational at all.

Useful changes will have a phabricator task associated with them. They will thus be easy to find and completed.

Not really, because 1) phab tasks are not very easy to find in the first place, 2) the way patchsets are associated to tasks is rather crappy. 2) will probably improve when we switch to Differential but 1) won't.

Also, after seeing how the Bugzilla migration threw obsoleted patches (the old equivalent of an abandoned changeset) under the bus, I don't trust that not happening again.

Let's pretend for a moment that the backlog is a huge gold mine, not a huge pile of dirt...

And let's pretend that phabricator is a storage space inside the gold mine where dirt/gold can be categorise and techniques for retrieval can be discussed. Right now we can't get into the gold mine to do digging as the door is blocked and it's not well organised! :)

Discarding the bottom layer of the storage space without looking at it is not what I would call organizing, but framing aside, I don't see what it would accomplish. We have old changesets because our code review throughput is smaller than the influx of new changes; *that* is the real problem to solve (unless we think the patches which did not get merged for a long time are probably not worth a reviewer's time, in which case throwing those away seems reasonable; but no one seems to be arguing for that). Abandoning old changesets does not fix or improve that problem, it just makes it less visible.

Regarding drafts, https://gerrit.wikimedia.org/r/#/q/is:draft,n,z has many merged changes, for one listed that isnt merged e.g. https://gerrit.wikimedia.org/r/#/c/282308/ . I cant see any mention that it is a draft on the gerrit changeset page.

I would be very happy for stale/incomplete patches to being a draft rather than forcibly abandoning them for the author.

Perhaps the drafts concept is better implemented in latter versions of gerrit?

I would be very happy for stale/incomplete patches to being a draft rather than forcibly abandoning them for the author.

+1, this suggestion is definitely worth exploring.

Is "draft" and "[WIP]" the same concept?

I would be very happy for stale/incomplete patches to being a draft rather than forcibly abandoning them for the author.

Drafts are worse than abandoning since they are only visible to the author and reviewers.

I would be very happy for stale/incomplete patches to being a draft rather than forcibly abandoning them for the author.

Drafts are worse than abandoning since they are only visible to the author and reviewers.

Even after the change has become public?

I tried turning a test changeset into draft but I found no way to do that. I could upload a draft patchset (now PS1 is publicly visible, PS2 is only visible to me) but I could not turn PS1 into a draft after it has been published. If I push an existing commit without changes, I just get an error that gerrit rejected it due to lack of changes.

Since this is not an urgent / short term change, it is probably better to think of a process in the context of Differential. Maybe it is easier to play with statuses or tags in Differential? This would allow reviewers to define their own default queries for their list of patches to be reviewed. Maybe we could configure a different default query as well.

I agree with @Qgil principles at T113706#2189328. This sounds like a very reasonable workflow and very much resembles the imho natural way of a healthy and well-maintained open source project. I see the same policy in place on other projects such as: jQuery, ESLint, Bootstrap, and many others. I believe adopting this will be a very important step toward improving our strength, and attracting and retaining contributors.

However, there is a fundamental difference between our code review system and GitHub: Gerrit has no (usable) concept of a personal working space. More on that later.

For all the aforementioned projects, the typical pattern is: forking, branching, submitting. The review workflow applies to the submitted pull requests only. My version of this workflow:

Project maintainers are responsible in general (not patch authors). But, if a patch has issues, maintainers may close the request after a reasonable time window of inactivity.

On GitHub, such workflow hardly affects the contributor's workflow. They still have their own GitHub account view, separate from the pull request. They don't have to re-open or restore anything. They still have an online copy of all their branches and they can continue to push commits to those branches at any time. When they do so, GitHub automatically provides the option to update or (re)create a pull request for it. Very similar to the first time around.

Various people asked here what we gain from closing unresolved patches. The gain is that it gets us into a more healthy ecosystem where people can see what we're doing and how we're doing. And where maintainers can more realistically start taking on their ownership role and proactively review contributed patches.

Part of a project maintainer's responsibility should be a sense of ownership. While GitHub's interface provides a much better sense of isolation for each project, the idea is the same. As maintainer of a project in Gerrit you should be able to easily view open patches you can start reviewing. The inability for this list to reach 0 and being virtually endless list causes negative cognitive overhead. It's a waste of everyone's time. (creating a custom age-filter is too complicated, unreachable from the user interface, and inaccurate as many old patches are actionable and don't match the closing criteria mentioned by @Qgil)

Having a more realistic queue of unreviewed patches also provides a better picture to the outside. Various static analysis tools for GitHub uses the number of open pull requests and their age as an indicator to how well the project is maintained. It's much more welcoming as a new contributor knowing that there isn't a 2 year old abandoned patch still marked "open" with no clear future. It's messy and unattractive.

Some believe this policy would be controversial and cause negative reaction from the contributor's perspective. I agree, but only because of that fundamental difference. On Gerrit, the pool of "Open changes" has two consumers instead of one: Maintainers and patch authors. An open Gerrit change is both the web-based manifestation of the contributor's branch, and the pull request for it. Closing that change set, means the author won't see it on their own main dashboard anymore (except under a heading negative-section "Abandoned") and they can't continue work on it unless they dig it up and re-open it.

Since converting stale patches to 'draft' is technically not possible, and 'abandoned' is very negative and puts unloved but potentially valuable patches in the same pile as purposely undeniably unneeded 'abandoned' patches, rather than C.a.i "[after inactivity] anybody is entitled to mark the patch as abandoned", could we say "anybody is entitled to add [Incomplete] or [WIP] to the changeset subject."

Just out of interest... If the label of abandon patchset was archive patchset and the status was archived would that change how people think of things?

IMO there is no point putting much effort into figuring out a patch abandoning/archival workflow for Gerrit when we are in the process of migrating away from it. Tagging is a central concept in Phabricator so I'm sure it will be possible to label old/inactive changesets as archived (or whatever) in Differential.

Sure. I was just curious if this was a terminology problem since we had similar problems with the language used for -1s many years ago.

could we say "anybody is entitled to add [Incomplete] or [WIP] to the changeset subject."

AFAICT, this is already being done by users who use korma.wmflabs.org, which filters out WIP-titled changesets.

Qgil added a subscriber: hashar.

Thank you @Krinkle for your feedback on the principles. I have added them to the description for easier editing.

Removing Team Practices Group for now. If you need something specific from the TPG, we suggest you consider using a template along the lines of the example requests here:

https://www.mediawiki.org/wiki/Team_Practices_Group/How_To_Engage_With_TPG

Would this task be a good topic for the Wikimedia-Developer-Summit (2017) ? If so, the deadline to submit new proposals is next Monday, October 31: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/Call_for_participation

No - this task does not describe a problem to solve but only one potential theoretical solution to a problem way better defined and discussed in other tasks. (That's also a reason why I've personally neglected this task.)

In that case, should we simply decline it?

I'm declining this because I don't see a convincing need for having a rule. People can but do not have to abandon old changesets.
Anyone interested could work on coming up with guidelines and recommendations that offer criteria to help a reviewer when to potentially abandon a changeset, and reopen this task and assign it to themselves.

We have a similar rule for our projects (including MobileFrontend) although I can't find the link anywhere right now and it is not always enforced strictly.