Page MenuHomePhabricator

branch.py: Convert deleted wmf branches to tags
Closed, ResolvedPublic

Description

On T244368, we added deletion of old wmf branches in mediawiki/tools/release/delete-wmf-branches.

This has since been superseded by @mmodell's e641ec26 - "Add the very dangerous --delete argument to mass delete branches" in branch.py.

That feature should additionally do the equivalent of git merge-base to find a branch point and tag releases (or at least a close approximation) before deleting the branch, so that users and tooling can reference it without the costs of keeping the wmf/* branches around indefinitely.

Pairing with Tyler today yielded a gitiles URL:

https://gerrit.wikimedia.org/g/mediawiki/core/+log/master%5E%5E..wmf/1.35.0-wmf.30?format=JSON&no-merges&reverse&n=1

Finding the parent of that commit should be equivalent to merge-base.

Event Timeline

Change 588467 had a related patch set uploaded (by Brennen Bearnes; owner: Brennen Bearnes):
[mediawiki/tools/release@master] (DNM) delete-wmf-branches: tag at branchpoints

https://gerrit.wikimedia.org/r/588467

Change 588467 abandoned by Brennen Bearnes:
(DNM) delete-wmf-branches: tag at branchpoints

Reason:
Going to use Mukunda's changes to branch.py instead.

https://gerrit.wikimedia.org/r/588467

brennen renamed this task from delete-wmf-branches: Convert deleted wmf branches to tags to branch.py: Convert deleted wmf branches to tags.May 1 2020, 6:21 PM
brennen updated the task description. (Show Details)
brennen moved this task from Needs/Waiting Review to Next on the User-brennen board.
brennen added a subscriber: mmodell.
thcipriani added a subscriber: thcipriani.

Maybe tag the tip of the branch rather than branch merge-base.

What value do we hope to get from having these tags, indefinitely? I suppose it would offer to know that a certain patch was cherry-picked in a given week some years ago, but how could that help us? And, how would this information actually be accessible? The day to day tooling would not look at it afaik, and that same tooling appears to already have this information. For example, Gerrit has all changesets and backports we made, these don't dissappear when target branch is removed. And Phab and SAL also record most if not all backports in a way that is contextual and already integrated exactly where one would want to integrate it, I think. We also have changelogs published on mediawiki.org for the non-cherrypick changes between branches.

There are some places that would surface these tags, such as git tag, and Gitiles, and the aforementioned "included in" feature in Gerrit; but from what I can tell, that would only add redundant noise, and no new signal.

I don't think the noise would be problematic or significantly affect productivity, but it does add up if we do other things like this. And of course there's cost in implementing this and maintaining/running it, so I'm mainly just curious what value we hope to get from it, and so that we have something to refer to when e.g. the next person on the team proposes to remove it N months/years from now; instead of flip-flopping between removing it because its unused and adding it back because there's no reason not to. :)

What value do we hope to get from having these tags, indefinitely? I suppose it would offer to know that a certain patch was cherry-picked in a given week some years ago, but how could that help us?

The utility that comes to mind for me is being able to ask questions about things like:

  • Cycle time.
  • Size / weight of deployed versions.
  • Correlations between reported bugs and classes / characteristics of deployed changes.

...from machine-readable data in the repository itself.

I don't feel super strongly about this or anything, and I'm not a proponent of metrics for the sake of metrics, but I guess it's a potentially useful signal we give up when deleting the branches.

Change 683934 had a related patch set uploaded (by Ahmon Dancy; author: Ahmon Dancy):

[mediawiki/tools/release@master] branch.py: Convert deleted wmf branches to tags

https://gerrit.wikimedia.org/r/683934

Change 683934 merged by jenkins-bot:

[mediawiki/tools/release@master] branch.py: Convert deleted wmf branches to tags

https://gerrit.wikimedia.org/r/683934

dancy added a subscriber: dancy.

What value do we hope to get from having these tags, indefinitely? I suppose it would offer to know that a certain patch was cherry-picked in a given week some years ago, but how could that help us?

The utility that comes to mind for me is being able to ask questions about things like:

  • Cycle time.
  • Size / weight of deployed versions.
  • Correlations between reported bugs and classes / characteristics of deployed changes.

...from machine-readable data in the repository itself.

These are theoretical future use cases, which may not even need to go back further than we keep branches for. Gerrit's API is machine-readable as well, which afaik has all this information, possibly made easier if we hashtag the branch-create patches, or simply record this data via Scap by appending machine-readable to a file in an internal releng/ or mediawiki/tools/ Git repo for us to use.

I don't think most major software manage their internal deployments with indefinite tags on their public canonical repo. The following is an impression of what I think this would end up doing. I would hope at the very least we can agree up front to an expiry to these, e.g. 12 months or something.

Screenshot 2021-04-30 at 18.51.29.png (916×1 px, 240 KB)

Screenshot 2021-04-30 at 18.54.43.png (1×1 px, 531 KB)

Anyway, like I said, I don't think there is any immediate damage. I'll revisit this in a year in a new task.

What value do we hope to get from having these tags, indefinitely? I suppose it would offer to know that a certain patch was cherry-picked in a given week some years ago, but how could that help us?

The utility that comes to mind for me is being able to ask questions about things like:

  • Cycle time.
  • Size / weight of deployed versions.
  • Correlations between reported bugs and classes / characteristics of deployed changes.

...from machine-readable data in the repository itself.

These are theoretical future use cases, which may not even need to go back further than we keep branches for.

Not entirely theoretical; there are plans in our annual planning documents to start exposing this information more widely (and consistently). We're meeting with Research on Monday to discuss some aspects of this. Until we know more I'd like to not lose any data that may be helpful.