Page MenuHomePhabricator

Plan migration of ORES repos to git-lfs
Closed, ResolvedPublic

Description

Reading https://github.com/git-lfs/git-lfs/wiki/Tutorial#migrating-existing-repository-data-to-lfs, our options are either to rewrite git history in our repos, or create new repos. We need to do one or the other to avoid having the gigantic history in our repos.

Who would be in a good position to give us advice about how to proceed? The masters for our huge repos are,

On another topic, we need to make sure that scap will rehydrate the repos.

Event Timeline

I'm guessing we want to do something like,

  1. Copy repos to a read-only location.
  2. Set LFS flags and metadata on repo (unknown)
  3. git lfs track '*.whl' '*.model'
  4. git add .gitattributes
  5. git lfs migrate import --include="*.whl" --include="*.model" --include-ref=refs/heads/master
  6. git push (--force?)

No idea what will happen when the rewritten repos are mirrored over to phabricator git.

They'll mirror just fine since Phabricator just observes upstream.

Change 394179 had a related patch set (by Paladox) published:
[All-Projects@refs/meta/config] Enable git-lfs for research/ores/wheels

https://gerrit.wikimedia.org/r/394179

Change 394179 merged by Chad:
[All-Projects@refs/meta/config] Enable git-lfs for research/ores/wheels

https://gerrit.wikimedia.org/r/394179

Trying start a gerrit review for wheels. Got this:

Do you really want to submit the above commits?
Type 'yes' to confirm, other to cancel: yes
remote: Processing changes: refs: 1, done            
To ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels
 ! [remote rejected] HEAD -> refs/publish/master/git-lfs-migration (no common ancestry)
error: failed to push some refs to 'ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels'

Looks like we might need a manual force push.

Putting repo backups here: https://analytics.wikimedia.org/datasets/archive/public-datasets/all/ores/

I'm editquality and draftquality are there. I'm still working on wikiclass compression.

https://github.com/wiki-ai/editquality is fully updated.

Well.. I've had my github account locked, so I'm working on experimenting with gitlab.

I've completed the upload of LFS'd content for
https://gitlab.com/wiki-ai/articlequality (was "wikiclass")
https://gitlab.com/wiki-ai/draftquality
https://gitlab.com/wiki-ai/editquality

You can find all the old repos compressed here:
https://analytics.wikimedia.org/datasets/archive/public-datasets/all/ores/

I'm working on updating https://phabricator.wikimedia.org/source/editquality to pull from gitlab and I'm getting

Error updating working copy: Command failed with error #128!
COMMAND
git ls-remote '********'

STDOUT
(empty)

STDERR
fatal: unable to access 'https://gitlab.com/wiki-ai/editquality/': Failed to connect to gitlab.com port 443: Connection timed out

@mmodell any thoughts?

Trying start a gerrit review for wheels. Got this:

Do you really want to submit the above commits?
Type 'yes' to confirm, other to cancel: yes
remote: Processing changes: refs: 1, done            
To ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels
 ! [remote rejected] HEAD -> refs/publish/master/git-lfs-migration (no common ancestry)
error: failed to push some refs to 'ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels'

Looks like we might need a manual force push.

Did we not get this done yet? I don't see any repos of real size on disk.

I'm working on updating https://phabricator.wikimedia.org/source/editquality to pull from gitlab and I'm getting

Error updating working copy: Command failed with error #128!
COMMAND
git ls-remote '********'

STDOUT
(empty)

STDERR
fatal: unable to access 'https://gitlab.com/wiki-ai/editquality/': Failed to connect to gitlab.com port 443: Connection timed out

@mmodell any thoughts?

Needs to be proxied. It's not on the whitelist for it. Right now we only have Github mentioned. But I'm not entirely sure why we're using outside repos for this since we already enabled LFS in Gerrit.

@demon, right, I'm still not able to push the wheels LFS migration. Can you help us get gitlabs proxied?

@demon, right, I'm still not able to push the wheels LFS migration. Can you help us get gitlabs proxied?

That's one thing, but I'm still not sure why we can't use Gerrit here and mirror externally. I took the time to set up LFS explicitly for this. And if Github has blocked your account, I'm afraid you're tempting the devil with Gitlab too.

@demon, it seems this is a different conversation. We do want to use lfs internally on gerrit for our wheels repository. I've read through gitlab's docs and policies and they are much better in this regard. Please consider continuing the conversation about gitlabs in T181835: Add gitlab to proxies/whitelist for mirroring to phabricator.

I've de-converted all of our github repos so that we can continue work while we wait for T180628: Install git-lfs client (at least on scap targets & masters)

We're currently thinking that we want to normalize our repo locations in gerrit, and introduce git-lfs in the new locations. I'm purposefully not including a mapping between old and new repos, cos we don't want to automate the process. We're ignoring Phabricator for now, because it isn't set up for git-lfs.

  • scoring/ores/assets
  • scoring/ores/articlequality
  • scoring/ores/deploy
  • scoring/ores/draftquality
  • scoring/ores/drafttopic
  • scoring/ores/editquality

Mentioned in SAL (#wikimedia-operations) [2018-04-30T17:39:06Z] <awight@tin> Started deploy [ores/deploy@8c586ab]: Canary-only test deployment for ORES + git-lfs, T181678

Mentioned in SAL (#wikimedia-operations) [2018-04-30T17:41:05Z] <awight@tin> Finished deploy [ores/deploy@8c586ab]: Canary-only test deployment for ORES + git-lfs, T181678 (duration: 01m 59s)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T18:04:58Z] <awight@tin> Started deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T18:06:55Z] <awight@tin> Finished deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2) (duration: 01m 58s)

Pilot deployment to the canary server failed, with no errors:

1awight@tin:/srv/deployment/ores/deploy$ scap deploy-log -f scap/log/scap-sync-2018-04-26-0001-2-g46824bb.log
218:04:57 [tin] Started deploy [ores/deploy@46824bb]
318:04:57 [tin] Deploying Rev: HEAD = 46824bb18f206674272ebcf250532d0debab2d9e
418:04:57 [tin] Started deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2)
518:04:57 [tin]
6== CANARY ==
7:* ores1001.eqiad.wmnet
818:04:58 [ores1001.eqiad.wmnet] Fetch from: http://tin.eqiad.wmnet/ores/deploy/.git
918:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote rm origin'>: starting process
1018:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote rm origin', pid 30015>: process started
1118:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote rm origin', pid 30015>: process completed
1218:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote add origin http://tin.eqiad.wmnet/ores/deploy/.git'>: starting process
1318:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote add origin http://tin.eqiad.wmnet/ores/deploy/.git', pid 30019>: process started
1418:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote add origin http://tin.eqiad.wmnet/ores/deploy/.git', pid 30019>: process completed
1518:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git fetch --tags --jobs 38'>: starting process
1618:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git fetch --tags --jobs 38', pid 30023>: process started
1718:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git fetch --tags --jobs 38', pid 30023>: process completed
1818:04:58 [ores1001.eqiad.wmnet] Update submodules
1918:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38'>: starting process
2018:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38', pid 30085>: process started
2118:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38', pid 30085>: process completed
2218:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git clone --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e'>: starting process
2318:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git clone --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e', pid 30182>: process started
2418:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git clone --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e', pid 30182>: process completed
2518:04:58 [ores1001.eqiad.wmnet] Checkout rev: 46824bb18f206674272ebcf250532d0debab2d9e
2618:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git checkout --force --quiet 46824bb18f206674272ebcf250532d0debab2d9e'>: starting process
2718:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git checkout --force --quiet 46824bb18f206674272ebcf250532d0debab2d9e', pid 30195>: process started
2818:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git checkout --force --quiet 46824bb18f206674272ebcf250532d0debab2d9e', pid 30195>: process completed
2918:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache'>: starting process
3018:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache', pid 30199>: process started
3118:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache', pid 30199>: process completed
3218:06:21 [ores1001.eqiad.wmnet] Pulling large objects [using git-lfs]
3318:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git lfs pull'>: starting process
3418:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git lfs pull', pid 30828>: process started
3518:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git lfs pull', pid 30828>: process completed
3618:06:21 [ores1001.eqiad.wmnet] Executing check 'fetch_checks'
3718:06:41 [ores1001.eqiad.wmnet] config_deploy is not enabled in scap.cfg, skipping.
3818:06:42 [ores1001.eqiad.wmnet] Executing check 'promote_checks'
3918:06:46 [ores1001.eqiad.wmnet] Restarting service 'uwsgi-ores'
4018:06:54 [tin]
41== CANARY ==
42:* ores1001.eqiad.wmnet
4318:06:55 [ores1001.eqiad.wmnet] Removing old revision /srv/deployment/ores/deploy-cache/revs/543901a78df32c4a9f269334919909b0479a223f
4418:06:55 [tin] Finished deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2) (duration: 01m 58s)
4518:06:55 [tin] Finished deploy [ores/deploy@46824bb] (duration: 01m 58s)

Looks good, but the large file was never checked out:

awight@ores1001:/srv/deployment/ores/deploy$ ls -la submodules/assets/
total 16
drwxr-xr-x 2 deploy-service deploy-service 4096 Apr 30 18:06 .
drwxr-xr-x 8 deploy-service deploy-service 4096 Apr 30 18:04 ..
-rw-r--r-- 1 deploy-service deploy-service   45 Apr 30 18:04 .git
-rw-r--r-- 1 deploy-service deploy-service  102 Apr 30 18:06 .gitreview
awight@ores1001:/srv/deployment/ores/deploy$ git submodule status submodules/assets
 5e17e191f08d71b0caca8b7d826ce531763ef7bf submodules/assets (heads/master)

Don't know if I can dig much further:

awight@ores1001:/srv/deployment/ores/deploy/submodules/assets$ git lfs ls-files
ERROR: init LocalStorage: mkdir /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e/.git/modules/submodules/assets/lfs: permission denied

Maybe we need git lfs pull --recursive? No clue why this would have worked on beta without the --recursive, however.

Apparently I'm bad at git, and I failed to commit the right submodule pointers... trying again.

Mentioned in SAL (#wikimedia-operations) [2018-04-30T19:08:30Z] <awight@tin> Started deploy [ores/deploy@25579e7]: Trial LFS deployment to ORES canary; T181678

Mentioned in SAL (#wikimedia-operations) [2018-04-30T19:10:36Z] <awight@tin> Finished deploy [ores/deploy@25579e7]: Trial LFS deployment to ORES canary; T181678 (duration: 02m 06s)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:06:59Z] <awight@tin> Started deploy [ores/deploy@4601497]: Trial LFS deployment to ORES canary; T181678 (take 2)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:09:09Z] <awight@tin> Finished deploy [ores/deploy@4601497]: Trial LFS deployment to ORES canary; T181678 (take 2) (duration: 02m 10s)

Gave it another try, with commit rORESDEPLOY4601497c4f43, and got strange results. The LFS data should have been downloaded during "git submodule update --init --recursive", AFAICT, but was not. Then "git lfs pull" should have gotten us the data in case the submodule update didn't work, but that didn't happen either. All commands executed successfully but we were left with only the pointer file and no large data in .git/modules nor checked out in submodules/assets.

@awight: git lfs install needs to be executed on each target and that isn't happening, currently. I can add a hook to scap to do that but maybe it would be better to do it via puppet? I'm not sure what's best since scap is currently stateless with regard to lfs. Adding a call to lfs-install every time scap runs would be sorta wasteful but safe I suppose.

@mmodell I'm reading some strange stuff here, https://github.com/git-lfs/git-lfs/wiki/Installation
Apparently, git lfs install enables LFS globally for a user, which is harmless on dedicated ORES boxes, but unnecessary. We can run git lfs install --local to enable LFS on just the target repo, and can do that immediately after git clone, which should solve the state/stateless question, I think.

Also take a look at --skip-smudge and git lfs pull, it suggests that we don't need to run git lfs pull if we rely on the default "smudge" config.

@awight: I don't think git lfs install --local will take care of the submodules. I suppose I could do git submodule foreach 'git lfs install --local' though

I'm happy with either solution, either a redundant git lfs install or the
submodule foreach. It would be very surprising if the global method had
any impact on non-LFS repos.

Closing, our plan is simple:

  • Get deployment working with a single LFS file, in the submodules/assets directory.
  • Once that works, feel free to add LFS anywhere that makes sense. Make a new task for any specific, non-trivial work.
awight claimed this task.