We run a custom git-fat package, which is in standard packages and deployed on every host. It's written in Python 2, which is EOL. Py3 support isn't completed upstream: https://github.com/jedbrown/git-fat/issues/92, so we could also collaborate with them if we want to continue to use git-fat.
Description
Related Objects
- Mentioned In
- T313250: Bring up gerrit2002
T243027: replacement for gerrit2001, decom gerrit2001 - Mentioned Here
- T243027: replacement for gerrit2001, decom gerrit2001
T313250: Bring up gerrit2002
T235013: Use `git lfs` for large binary files of Design Style Guide
T147856: Scap deploy failed to sync git-fat artifacts
T155856: Package + deploy new version of git-fat
T202100: Intermittent git-fat failure during deploy
T214229: scap3 + git-fat results in git status with permissions errors
Event Timeline
git-fat is the only package requiring Python 2 in a base bullseye setup at this point.
I'm not familiar in detail with the current use cases of git-fat, but moving to a supported different tool is probably the better path forward than porting git-fat ourselves. Both git-lfs and git-annex seem like viable alternatives to explore (both are already packaged in Debian)
I can't say for sure specially since it's part of base packages so it could be used anywhere but the only explicit usage is archiva and I hope we can find a usecase to just avoid using that. git-lfs seems to be the industry standard these days.
@thcipriani Based on reading about git-lfs and git-fat (including outstanding issues on GitHub), I'm in favor of migrating to git-lfs and updating scap and archiva as needed. I can help on the scap side. I haven't touched archiva yet.
There is support in scap3 for git-lfs, but it's not used (as far as I'm aware) or well-tested. It *might* already work.
I honestly hadn't touched archiva either. There's a shell script (originally written by @Ottomata judging from git-blame) that moves java jars to the place git-fat expects to find them. Maybe we can just ditch that script and deploy directly from Gerrit (given we have the git-lfs extension for gerrit installed and gitlab has git-lfs support as well).
The last time we talked about git-lfs in detail that I can recall is T235013: Use `git lfs` for large binary files of Design Style Guide
deploy directly from Gerrit
...say more :)
The jar binaries are built by maven-release-plugin in a jenkins job and then uploaded to Archiva using the Archiva API. They are then synced into a git fat repo. Deploy repos then git fat add them, and scap can rsync them (via git fat) to their target hosts on deploy.
archiva-gitfat-link just scans the archiva repository directory for artifact files, and then makes symlinks to them in a git-fat folder named by their shasum, as git fat expects. I'm not familiar with how git-lfs works, but perhaps it can be made to work the same way? Is it an rsync remote?
I think the only transfer adapter that's Officially® supported is the http basic transfer. Our gerrit has the lfs plugin installed, so that implements the server side of git-lfs.
So rather than build jar files and upload to archiva, we'd build jar files and add .jar to .gitattributes to be managed via git-lfs, then those jars would get stored on the gerrit host. On deployment (or fetch), each target would fetch the jar via a GET request to gerrit (is my rough mental model).
Unknowns:
- Changes to maven-release-plugin CI job—Maven supports uploading to archiva, but probably not to lfs (plus we probably don't want repo push creds in CI?)
- Gerrit has a lot of disk space, but how much disk space do we use in archiva?
- How many hosts are deployed in parallel and what kind of load will that put on gerrit?
- Are targets allowed to make outbound connections to gerrit?
- Unknowns around protocol/network traffic changes. Not expecting issues, really, but it's a change.
Can .jar .gitattributes be manged by git-lfs to download from Archiva API directly? E.g. this URL: http://archiva.wikimedia.org/repository/releases/org/wikimedia/analytics/refinery/job/refinery-job/0.1.26/refinery-job-0.1.26-shaded.jar (copied from https://archiva.wikimedia.org/#artifact-details-download-content/org.wikimedia.analytics.refinery.job/refinery-job/0.1.26)
Mentioned in SAL (#wikimedia-operations) [2022-08-02T20:38:01Z] <mutante> re-imaging gerrit2002 with buster - because it's on bullseye, needs git-fat and that has not been ported to python3 yet which blocks upgrading gerrit machines otherwise T313250 T243027 T279509