Page MenuHomePhabricator

git-fat needs to be ported to Python 3
Open, HighPublic

Description

We run a custom git-fat package, which is in standard packages and deployed on every host. It's written in Python 2, which is EOL. Py3 support isn't completed upstream: https://github.com/jedbrown/git-fat/issues/92, so we could also collaborate with them if we want to continue to use git-fat.

Event Timeline

git-fat is the only package requiring Python 2 in a base bullseye setup at this point.

git-fat is the only package requiring Python 2 in a base bullseye setup at this point.

Is there a way to migrate to git-lfs instead?

git-fat is the only package requiring Python 2 in a base bullseye setup at this point.

Is there a way to migrate to git-lfs instead?

I'm not familiar in detail with the current use cases of git-fat, but moving to a supported different tool is probably the better path forward than porting git-fat ourselves. Both git-lfs and git-annex seem like viable alternatives to explore (both are already packaged in Debian)

I can't say for sure specially since it's part of base packages so it could be used anywhere but the only explicit usage is archiva and I hope we can find a usecase to just avoid using that. git-lfs seems to be the industry standard these days.

thcipriani added subscribers: dancy, hashar.

In our team meeting we talked about the possibility of migrating git-fat (600 lines of python2 → python3) vs. making the needed changes in scap and archiva to support git-lfs.

Tagging in @hashar and @dancy for their thoughts on this task.

@thcipriani Based on reading about git-lfs and git-fat (including outstanding issues on GitHub), I'm in favor of migrating to git-lfs and updating scap and archiva as needed. I can help on the scap side. I haven't touched archiva yet.

T214229 - scap3 + git-fat results in git status with permissions errors

T202100 - Intermittent git-fat failure during deploy

T147856 - Scap deploy failed to sync git-fat artifacts

T155856 - Package + deploy new version of git-fat

I can help on the scap side. I haven't touched archiva yet.

There is support in scap3 for git-lfs, but it's not used (as far as I'm aware) or well-tested. It *might* already work.

I honestly hadn't touched archiva either. There's a shell script (originally written by @Ottomata judging from git-blame) that moves java jars to the place git-fat expects to find them. Maybe we can just ditch that script and deploy directly from Gerrit (given we have the git-lfs extension for gerrit installed and gitlab has git-lfs support as well).

The last time we talked about git-lfs in detail that I can recall is T235013: Use `git lfs` for large binary files of Design Style Guide

deploy directly from Gerrit

...say more :)

The jar binaries are built by maven-release-plugin in a jenkins job and then uploaded to Archiva using the Archiva API. They are then synced into a git fat repo. Deploy repos then git fat add them, and scap can rsync them (via git fat) to their target hosts on deploy.

archiva-gitfat-link just scans the archiva repository directory for artifact files, and then makes symlinks to them in a git-fat folder named by their shasum, as git fat expects. I'm not familiar with how git-lfs works, but perhaps it can be made to work the same way? Is it an rsync remote?

deploy directly from Gerrit

...say more :)

The jar binaries are built by maven-release-plugin in a jenkins job and then uploaded to Archiva using the Archiva API. They are then synced into a git fat repo. Deploy repos then git fat add them, and scap can rsync them (via git fat) to their target hosts on deploy.

archiva-gitfat-link just scans the archiva repository directory for artifact files, and then makes symlinks to them in a git-fat folder named by their shasum, as git fat expects. I'm not familiar with how git-lfs works, but perhaps it can be made to work the same way? Is it an rsync remote?

I think the only transfer adapter that's Officially® supported is the http basic transfer. Our gerrit has the lfs plugin installed, so that implements the server side of git-lfs.

So rather than build jar files and upload to archiva, we'd build jar files and add .jar to .gitattributes to be managed via git-lfs, then those jars would get stored on the gerrit host. On deployment (or fetch), each target would fetch the jar via a GET request to gerrit (is my rough mental model).

Unknowns:

  • Changes to maven-release-plugin CI job—Maven supports uploading to archiva, but probably not to lfs (plus we probably don't want repo push creds in CI?)
  • Gerrit has a lot of disk space, but how much disk space do we use in archiva?
  • How many hosts are deployed in parallel and what kind of load will that put on gerrit?
  • Are targets allowed to make outbound connections to gerrit?
  • Unknowns around protocol/network traffic changes. Not expecting issues, really, but it's a change.

Mentioned in SAL (#wikimedia-operations) [2022-08-02T20:38:01Z] <mutante> re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise T313250 T243027 T279509