Page MenuHomePhabricator

[Epic] ORES should use a git large file plugin for storing serialized binaries
Closed, ResolvedPublic

Description

It looks though ORES is currently deploying large binary blobs from git via scap3. Unfortunately, git does not scale when checking in large binary objects--they do not diff well so git can only pack them but so tightly. Performance quickly becomes an issue.

Scap has the ability to move large binaries about by using git-fat to fetch them over rsync from some source. We should figure out where to fetch these files from so we can have a much smaller (and usable) repository.

Docs on setting up git-fat (don't worry, it's not really archiva-specific).

Event Timeline

demon created this task.Jul 25 2017, 5:02 PM
Restricted Application added a project: Scoring-platform-team. · View Herald TranscriptJul 25 2017, 5:02 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
demon updated the task description. (Show Details)Jul 25 2017, 5:04 PM

Striker could use this too. It has the same sort of wheel blob repo as ORES.

awight added a subscriber: awight.Jul 25 2017, 5:41 PM

I don't want to hijack this task, but for the record our binary problem is much more severe in the editquality repo, where a few dozen 10MB models change rapidly, swelling our git repo to >1.5GB and bringing tears to our eyes during both development and deployment. Also noting that the blocker to this task as stated (also the broader task) is that we deploy these repos to both labs and production, and according to oral tradition git-fat doesn't support labs deployments yet.

greg added a subscriber: greg.Jul 25 2017, 5:43 PM

Yeah, let's do it for all the repos with binaries :) And we'll figure out something re labs + prod re git-fat access.

Halfak added a subscriber: Halfak.EditedJul 25 2017, 5:44 PM

Will it work with out github repos? We do development mainly against them and mirror to diffusion/gerrit.

demon added a comment.Jul 26 2017, 5:23 AM

For checking in, yes without issue (that's all client side). However fetch/pull requires rsync access to fatten the blobs (internal term is hydrate).

This complicates things a tad--RelEng doesn't officially support deploying from Github--but it's not undoable.

but it's not undoable.

What?

We do github based deploys in our Cloud VPS cluster (not beta) using fabric. Part of that fabric script performs a git pull from our github repos.

AFAICT, there's no good option for doing scap-based deploys in Cloud VPS (outside of beta). Is that still true?

AFAICT, there's no good option for doing scap-based deploys in Cloud VPS (outside of beta). Is that still true?

It is certainly possible, but you need to run a deploy server in your Cloud VPS project. There is no good way for Cloud Services to provide a shared scap deployment server for all/multiple Cloud VPS projects. If you are interested in running your own deploy server I have some basic instructions at https://wikitech.wikimedia.org/wiki/User:BryanDavis/Scap3_in_a_Labs_project that I wrote up when I built the service out for https://tools.wmflabs.org/openstack-browser/project/striker.

Gotcha. Maybe we could stick that in our ores-staging project.

@demon, if that makes things easier for you, we can block this on getting scap set up for VPS deployments. If we do, we would presumably be doing things the same way we would in beta-labs. Git-fat must work there, right?

Halfak added a subscriber: Paladox.Jul 26 2017, 3:58 PM

I talked to @Paladox -- who offered that we could make use of phab-tin. Would it be better to set up our own deployment server within ores or ores-staging?

@demon We're fine with deploying from WMF production repos, I'm sure we can figure something out to push mirrored code or just make these the masters for deployment. In other words, we're not trying to deploy directly from GitHub.

I think the question is, can WMF host the git-fat server in a way that we can pull from it for both production and labs deployment? That should solve 90% of our woes.

(Just doing some project management, don't worry about the "watching" bit, we'll just create a (sub)task for any bits of this that we need to do.)

demon added a comment.Jul 26 2017, 4:49 PM

but it's not undoable.

What?

Bad choice of words. I meant it's not impossible. Sorry for the confusion.

@demon We're fine with deploying from WMF production repos, I'm sure we can figure something out to push mirrored code or just make these the masters for deployment. In other words, we're not trying to deploy directly from GitHub.
I think the question is, can WMF host the git-fat server in a way that we can pull from it for both production and labs deployment? That should solve 90% of our woes.

Yes. That's the issue we need to tackle--and exactly what I meant by complicates things. We need to simplify the usage of git-fat so things outside of production can make use of it.

Halfak renamed this task from ORES should use git-fat for wheel deployments to ORES should use git-fat for binaries.Jul 27 2017, 2:52 PM
Halfak changed the task status from Open to Stalled.
Halfak raised the priority of this task from Normal to High.
Halfak changed the status of subtask T171758: Support git-lfs files in gerrit from Open to Stalled.
Halfak moved this task from Untriaged to Maintenance/cleanup on the Scoring-platform-team board.
awight renamed this task from ORES should use git-fat for binaries to ORES should use a git large file plugin for storing serialized binaries.Sep 9 2017, 12:54 AM
Paladox changed the task status from Stalled to Open.Nov 29 2017, 8:49 PM
awight renamed this task from ORES should use a git large file plugin for storing serialized binaries to [Epic] ORES should use a git large file plugin for storing serialized binaries.Oct 10 2018, 11:24 PM
awight moved this task from Active to Epic on the Scoring-platform-team (Current) board.
awight added a project: Epic.
demon removed a subscriber: demon.Feb 19 2019, 10:33 AM
awight removed a subscriber: awight.Mar 21 2019, 4:01 PM
Halfak closed this task as Resolved.Apr 10 2019, 5:39 PM
Halfak claimed this task.

Seems like this is done.