Page MenuHomePhabricator

Prevent Gerrit archived repositories from being replicated to GitHub
Open, Needs TriagePublic

Description

From Projects-Cleanup we archive a repository by marking it Read-Only but that does not prevent it from being replicated to GitHub. Once a repository is archived in Gerrit, I don't think there is much point in replicating it to GitHub.

The replication plugin would only replicate repositories that have the READ permission set for the special authentication group mediawiki-replication:

hieradata/common/profile/gerrit.yaml
profile::gerrit::replication:
    github:
        url: 'git@github.com:wikimedia/${name}'
        authGroup: 'mediawiki-replication'

The permission is granted from All-Projects.git:

project.config
[access "refs/*"]
    read = group mediawiki-replication

Thus tentatively, we can create a new permission repository All-Archived-Project which would take away the READ permission from mediawiki-replication group by creating an access rule BLOCK. Then when we archive projects, we would reparent them to that All-Archived-Project and they will no more be replicated to GitHub.

We might also be able to block changes to refs/meta/config to anyone but Administrators and Gerrit managers to prevent owners from restoring a repository. But that is another topic of its own.

Event Timeline

I created All-Archived-Projects just now (named to match All-Projects).

Owned by Gerrit Managers (so they can un-archive/reparent if needed).

The project's permissions BLOCK read on refs/* by mediawiki-replication (as you suggested).

I also allowed made changes to make archiving easier (I hope).


Gerrit Managers are allowed to push to any project under All-Archived-Projects for the refs/heads/ARCHIVED and refs/heads/MOVED_TO_GITLAB branches.

This should make archiving easier.

Archiving before this change

  • Deleted everything in the repo, leaving only a README.md
  • Requires:
    • cloning the repo
    • deleting everything in the worktree
    • pushing this new commit to HEAD
  • Impact to users:
    • git pull deletes all files in their worktree.
    • git clone gives them a repo with only a README.md.

Archiving after this change

  • Create a local orphan branch with a README
  • Push this branch to either refs/heads/ARCHIVED or refs/heads/MOVED_TO_GITLAB (as appropriate)
  • Set the repo's HEAD to the new branch (via the gerrit api)
  • Impact to users:
    • git pull they'll see the new branch called ARCHIVED get pulled down, but the worktree will remain unchanged.
    • git clone gives them a repo with only a README.md

Benefits

  • Simple to script without having to clone all repos: reparent repo, create orphan branch, push orphan branch, set new upstream HEAD, archive.
  • Any automated users that pull the repo won't have its files wiped.
  • Should be just as clear (or clearer) for real-life users (I was confused when I ran git pull followed by git grep without checking the contents of what I pulled down).

Mentioned in SAL (#wikimedia-releng) [2024-06-13T11:31:50Z] <hashar> gerrit: reparent mediawiki/extensions/WikibaseSchema from mediawiki/extensions to All-Archived-Projects (T351543) to prevent GitHub replication # T367396

Mentioned in SAL (#wikimedia-releng) [2024-06-13T11:34:09Z] <hashar> gerrit: reparent mediawiki/tools/cli from mediawiki/tools to All-Archived-Projects (T351543) to prevent GitHub replication # T333029

I have reparented two repositories to All-Archived-Projects since both of them had a rename on Github and that caused a clash:

That should fix it. Then I guess we'd want to reparent all the existing archived repositories, update the form used for Projects-Cleanup and update pending tasks.

I have reparented mediawiki/extensions/WikibaseSchema to All-Archived-Projects for T367396 and I have verified its is no more replicated to GitHub as a result 🏆 🎉 .

I have changed the form linked from Projects-Cleanup to mention the repository has to be reparented (and while at it remove existing access lists): https://phabricator.wikimedia.org/transactions/detail/PHID-XACT-FORM-rkfn6wt34jpsd3c/

Benefits

  • Simple to script without having to clone all repos: reparent repo, create orphan branch, push orphan branch, set new upstream HEAD, archive.
  • Any automated users that pull the repo won't have its files wiped.
  • Should be just as clear (or clearer) for real-life users (I was confused when I ran git pull followed by git grep without checking the contents of what I pulled down).

Con: I now can't help out with archive requests. :-( Oh well. Hopefully someone else can pick up the slack.

Rephrasing James' point:

Con: Archiving can only be performed by Gerrit managers/admins rather than owners of the specific repository being archived.

This seems wrong to me as a general rule (even ignoring what permissions James happens to have) - I can't think of any reason why declaring that a repository is no longer active shouldn't be something that people are allowed to do themselves.

Normally the GitHub mirror for an archived repository is deleted outright so it doesn't matter whether the replication process can see it. And archiving has always been underloved, with 80 tasks languishing in Projects-Cleanup (and even more before I started taking the initiative to clear it a few months ago). so reducing the number of people able to do it makes things worse rather than better.

Other idea: we could have some bot go around and move self-service archived repos into All-Archived-Projects ¯\_(ツ)_/¯. Best of both worlds: people continue to archive, Gerrit doesn't waste time replicating.