Page MenuHomePhabricator

Split editquality repo to two repos, one with full history, one shallow
Closed, DeclinedPublic

Description

Right now the edit quality is super super heavy and it's making deployments take a very long time. It's because of lots of binary files we store there and git does a horrible job on storing things it can't delta. The option would be to remove the git history but we don't want to lose history of our models, so I suggest we have two repos, one for R&D (!) and the other one with ten commits at the most for prod.

Event Timeline

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJul 18 2017, 6:34 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
awight added a subscriber: awight.Jul 19 2017, 1:02 AM

Thinking about whether these repos should share a common ancestor, I realized I don't understand the proposal. How will we update the shallow repo when changing model files? Do we have to perform git history surgery each time?

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJul 19 2017, 7:07 AM

Yeah, I think every time we should clean up the history in the new one. It can be done automatically though.

Halfak added a subscriber: Halfak.Jul 24 2017, 3:51 PM

We really want something like git-lfs that tracks the history but never forces you to download it.

Maybe we should instead focus on making git-lfs work in prod. If we have that, then we are done.

Alternatively, we could set up git-lfs outside of prod and add a secondary step to our prod deploys that allow us to copy stuff from our trusted git-lfs repo to a place that prod can grab it.

Production can't talk to the outside and download things (unless using carbon as proxy which is very bad idea)

add a secondary step to our prod deploys that allow us to copy stuff from our trusted git-lfs repo to a place that prod can grab it.

:P

Noting that git-fat is already in production, but still not a ready-made fit for our use case. We need to be able to pull the model files for both production and labs deploys, and there's [something] wrong with the rsync server that prevents us from using for labs.

The team decided to move on with git-fat (or git-lfs) and not pursuit this method.

Ladsgroup closed this task as Declined.Jul 29 2017, 7:46 PM