Plan migration of ORES repos to git-lfs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	awight
	Nov 29 2017, 8:36 PM

Description

Reading https://github.com/git-lfs/git-lfs/wiki/Tutorial#migrating-existing-repository-data-to-lfs, our options are either to rewrite git history in our repos, or create new repos. We need to do one or the other to avoid having the gigantic history in our repos.

Who would be in a good position to give us advice about how to proceed? The masters for our huge repos are,

On another topic, we need to make sure that scap will rehydrate the repos.

Details

	Subject	Repo	Branch	Lines +/-
	Enable git-lfs for research/ores/wheels	All-Projects	refs/meta/config	+5 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	awight	T187217 [Epic] Support word2vec for production ORES models
Resolved	awight	T188446 Package word2vec binaries
Resolved	Halfak	T171619 [Epic] ORES should use a git large file plugin for storing serialized binaries
Resolved	awight	T181678 Plan migration of ORES repos to git-lfs
Resolved	• demon	T181835 Add gitlab to proxies/whitelist for mirroring to phabricator
Resolved	• mmodell	T180627 Support git-lfs in scap
Resolved	awight	T180628 Install git-lfs client (at least on scap targets & masters)
Declined	None	T182085 Connect Phabricator to swift for storage of git-lfs and file uploads.
Resolved	• mmodell	T192042 Create gerrit mirrors for all github-based ORES repos
Resolved	fgiunchedi	T192124 Deploy Scap 3.8.0 to production

Event Timeline

awight created this task.Nov 29 2017, 8:36 PM

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptNov 29 2017, 8:36 PM

awight updated the task description. (Show Details)Nov 29 2017, 8:39 PM

awight edited parent tasks, added: T171619: [Epic] ORES should use a git large file plugin for storing serialized binaries; removed: T171758: Support git-lfs files in gerrit.Nov 29 2017, 8:46 PM

I'm guessing we want to do something like,

Copy repos to a read-only location.
Set LFS flags and metadata on repo (unknown)
git lfs track '*.whl' '*.model'
git add .gitattributes
git lfs migrate import --include="*.whl" --include="*.model" --include-ref=refs/heads/master
git push (--force?)

No idea what will happen when the rewritten repos are mirrored over to phabricator git.

They'll mirror just fine since Phabricator just observes upstream.

Change 394179 had a related patch set (by Paladox) published:
[All-Projects@refs/meta/config] Enable git-lfs for research/ores/wheels

https://gerrit.wikimedia.org/r/394179

gerritbot added a project: Patch-For-Review.Nov 29 2017, 9:45 PM

Change 394179 merged by Chad:
[All-Projects@refs/meta/config] Enable git-lfs for research/ores/wheels

https://gerrit.wikimedia.org/r/394179

Trying start a gerrit review for wheels. Got this:

Do you really want to submit the above commits?
Type 'yes' to confirm, other to cancel: yes
remote: Processing changes: refs: 1, done            
To ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels
 ! [remote rejected] HEAD -> refs/publish/master/git-lfs-migration (no common ancestry)
error: failed to push some refs to 'ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels'

Looks like we might need a manual force push.

Putting repo backups here: https://analytics.wikimedia.org/datasets/archive/public-datasets/all/ores/

I'm editquality and draftquality are there. I'm still working on wikiclass compression.

https://github.com/wiki-ai/editquality is fully updated.

https://github.com/wiki-ai/draftquality is fully updated.

Well.. I've had my github account locked, so I'm working on experimenting with gitlab.

I've completed the upload of LFS'd content for
https://gitlab.com/wiki-ai/articlequality (was "wikiclass")
https://gitlab.com/wiki-ai/draftquality
https://gitlab.com/wiki-ai/editquality

You can find all the old repos compressed here:
https://analytics.wikimedia.org/datasets/archive/public-datasets/all/ores/

I'm working on updating https://phabricator.wikimedia.org/source/editquality to pull from gitlab and I'm getting

Error updating working copy: Command failed with error #128!
COMMAND
git ls-remote '********'

STDOUT
(empty)

STDERR
fatal: unable to access 'https://gitlab.com/wiki-ai/editquality/': Failed to connect to gitlab.com port 443: Connection timed out

@mmodell any thoughts?

In T181678#3798554, @Halfak wrote:

Trying start a gerrit review for wheels. Got this:

Do you really want to submit the above commits?
Type 'yes' to confirm, other to cancel: yes
remote: Processing changes: refs: 1, done            
To ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels
 ! [remote rejected] HEAD -> refs/publish/master/git-lfs-migration (no common ancestry)
error: failed to push some refs to 'ssh://halfak@gerrit.wikimedia.org:29418/research/ores/wheels'

Looks like we might need a manual force push.

Did we not get this done yet? I don't see any repos of real size on disk.

In T181678#3803356, @Halfak wrote:
I'm working on updating https://phabricator.wikimedia.org/source/editquality to pull from gitlab and I'm getting
Error updating working copy: Command failed with error #128!
COMMAND
git ls-remote '********'

STDOUT
(empty)

STDERR
fatal: unable to access 'https://gitlab.com/wiki-ai/editquality/': Failed to connect to gitlab.com port 443: Connection timed out
@mmodell any thoughts?

Needs to be proxied. It's not on the whitelist for it. Right now we only have Github mentioned. But I'm not entirely sure why we're using outside repos for this since we already enabled LFS in Gerrit.

@demon, right, I'm still not able to push the wheels LFS migration. Can you help us get gitlabs proxied?

Halfak mentioned this in T181835: Add gitlab to proxies/whitelist for mirroring to phabricator.Dec 1 2017, 6:55 PM

Halfak created subtask T181835: Add gitlab to proxies/whitelist for mirroring to phabricator.

In T181678#3803805, @Halfak wrote:

@demon, right, I'm still not able to push the wheels LFS migration. Can you help us get gitlabs proxied?

That's one thing, but I'm still not sure why we can't use Gerrit here and mirror externally. I took the time to set up LFS explicitly for this. And if Github has blocked your account, I'm afraid you're tempting the devil with Gitlab too.

@demon, it seems this is a different conversation. We do want to use lfs internally on gerrit for our wheels repository. I've read through gitlab's docs and policies and they are much better in this regard. Please consider continuing the conversation about gitlabs in T181835: Add gitlab to proxies/whitelist for mirroring to phabricator.

phuedx unsubscribed.Dec 1 2017, 7:30 PM

Halfak added a subtask: T180627: Support git-lfs in scap.Dec 1 2017, 9:19 PM

I've de-converted all of our github repos so that we can continue work while we wait for T180628: Install git-lfs client (at least on scap targets & masters)

I've re-enabled observation on:
https://phabricator.wikimedia.org/source/editquality
https://phabricator.wikimedia.org/source/draftquality
https://phabricator.wikimedia.org/source/wikiclass

greg removed a project: Release-Engineering-Team (Kanban).Dec 5 2017, 8:06 PM

• demon closed subtask T181835: Add gitlab to proxies/whitelist for mirroring to phabricator as Resolved.Dec 6 2017, 8:36 PM

• demon moved this task from Bugs & stuff to Repo Admin on the Gerrit board.Dec 7 2017, 4:14 PM

• mmodell removed a project: Scap (Tech Debt Sprint FY201718-Q2).Feb 1 2018, 12:13 AM

Halfak added a parent task: T188446: Package word2vec binaries.Mar 6 2018, 4:17 PM

We're currently thinking that we want to normalize our repo locations in gerrit, and introduce git-lfs in the new locations. I'm purposefully not including a mapping between old and new repos, cos we don't want to automate the process. We're ignoring Phabricator for now, because it isn't set up for git-lfs.

scoring/ores/assets
scoring/ores/articlequality
scoring/ores/deploy
scoring/ores/draftquality
scoring/ores/drafttopic
scoring/ores/editquality

awight edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Mar 6 2018, 8:26 PM

Halfak edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).Mar 26 2018, 2:33 PM

awight edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Apr 26 2018, 5:37 PM

awight moved this task from Parked to Pending deployment on the Machine-Learning-Team (Active Tasks) board.Apr 26 2018, 5:39 PM

Mentioned in SAL (#wikimedia-operations) [2018-04-30T17:39:06Z] <awight@tin> Started deploy [ores/deploy@8c586ab]: Canary-only test deployment for ORES + git-lfs, T181678

Mentioned in SAL (#wikimedia-operations) [2018-04-30T17:41:05Z] <awight@tin> Finished deploy [ores/deploy@8c586ab]: Canary-only test deployment for ORES + git-lfs, T181678 (duration: 01m 59s)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T18:04:58Z] <awight@tin> Started deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T18:06:55Z] <awight@tin> Finished deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2) (duration: 01m 58s)

Pilot deployment to the canary server failed, with no errors:

P7055 (An Untitled Masterwork)

1	awight@tin:/srv/deployment/ores/deploy$ scap deploy-log -f scap/log/scap-sync-2018-04-26-0001-2-g46824bb.log
2	18:04:57 [tin] Started deploy [ores/deploy@46824bb]
3	18:04:57 [tin] Deploying Rev: HEAD = 46824bb18f206674272ebcf250532d0debab2d9e
4	18:04:57 [tin] Started deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2)
5	18:04:57 [tin]
6	== CANARY ==
7	:* ores1001.eqiad.wmnet
8	18:04:58 [ores1001.eqiad.wmnet] Fetch from: http://tin.eqiad.wmnet/ores/deploy/.git
9	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote rm origin'>: starting process
10	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote rm origin', pid 30015>: process started
11	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote rm origin', pid 30015>: process completed
12	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote add origin http://tin.eqiad.wmnet/ores/deploy/.git'>: starting process
13	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote add origin http://tin.eqiad.wmnet/ores/deploy/.git', pid 30019>: process started
14	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git remote add origin http://tin.eqiad.wmnet/ores/deploy/.git', pid 30019>: process completed
15	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git fetch --tags --jobs 38'>: starting process
16	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git fetch --tags --jobs 38', pid 30023>: process started
17	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git fetch --tags --jobs 38', pid 30023>: process completed
18	18:04:58 [ores1001.eqiad.wmnet] Update submodules
19	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38'>: starting process
20	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38', pid 30085>: process started
21	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38', pid 30085>: process completed
22	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git clone --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e'>: starting process
23	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git clone --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e', pid 30182>: process started
24	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git clone --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/cache /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e', pid 30182>: process completed
25	18:04:58 [ores1001.eqiad.wmnet] Checkout rev: 46824bb18f206674272ebcf250532d0debab2d9e
26	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git checkout --force --quiet 46824bb18f206674272ebcf250532d0debab2d9e'>: starting process
27	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git checkout --force --quiet 46824bb18f206674272ebcf250532d0debab2d9e', pid 30195>: process started
28	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git checkout --force --quiet 46824bb18f206674272ebcf250532d0debab2d9e', pid 30195>: process completed
29	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache'>: starting process
30	18:04:58 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache', pid 30199>: process started
31	18:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git submodule update --init --recursive --jobs 38 --reference /srv/deployment/ores/deploy-cache/cache', pid 30199>: process completed
32	18:06:21 [ores1001.eqiad.wmnet] Pulling large objects [using git-lfs]
33	18:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git lfs pull'>: starting process
34	18:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git lfs pull', pid 30828>: process started
35	18:06:21 [ores1001.eqiad.wmnet] <Command u'/usr/bin/git lfs pull', pid 30828>: process completed
36	18:06:21 [ores1001.eqiad.wmnet] Executing check 'fetch_checks'
37	18:06:41 [ores1001.eqiad.wmnet] config_deploy is not enabled in scap.cfg, skipping.
38	18:06:42 [ores1001.eqiad.wmnet] Executing check 'promote_checks'
39	18:06:46 [ores1001.eqiad.wmnet] Restarting service 'uwsgi-ores'
40	18:06:54 [tin]
41	== CANARY ==
42	:* ores1001.eqiad.wmnet
43	18:06:55 [ores1001.eqiad.wmnet] Removing old revision /srv/deployment/ores/deploy-cache/revs/543901a78df32c4a9f269334919909b0479a223f
44	18:06:55 [tin] Finished deploy [ores/deploy@46824bb]: Canary-only test deployment for ORES + git-lfs, T181678 (take 2) (duration: 01m 58s)
45	18:06:55 [tin] Finished deploy [ores/deploy@46824bb] (duration: 01m 58s)

Looks good, but the large file was never checked out:

awight@ores1001:/srv/deployment/ores/deploy$ ls -la submodules/assets/
total 16
drwxr-xr-x 2 deploy-service deploy-service 4096 Apr 30 18:06 .
drwxr-xr-x 8 deploy-service deploy-service 4096 Apr 30 18:04 ..
-rw-r--r-- 1 deploy-service deploy-service   45 Apr 30 18:04 .git
-rw-r--r-- 1 deploy-service deploy-service  102 Apr 30 18:06 .gitreview

awight@ores1001:/srv/deployment/ores/deploy$ git submodule status submodules/assets
 5e17e191f08d71b0caca8b7d826ce531763ef7bf submodules/assets (heads/master)

Don't know if I can dig much further:

awight@ores1001:/srv/deployment/ores/deploy/submodules/assets$ git lfs ls-files
ERROR: init LocalStorage: mkdir /srv/deployment/ores/deploy-cache/revs/46824bb18f206674272ebcf250532d0debab2d9e/.git/modules/submodules/assets/lfs: permission denied

Maybe we need git lfs pull --recursive? No clue why this would have worked on beta without the --recursive, however.

Apparently I'm bad at git, and I failed to commit the right submodule pointers... trying again.

Mentioned in SAL (#wikimedia-operations) [2018-04-30T19:08:30Z] <awight@tin> Started deploy [ores/deploy@25579e7]: Trial LFS deployment to ORES canary; T181678

Mentioned in SAL (#wikimedia-operations) [2018-04-30T19:10:36Z] <awight@tin> Finished deploy [ores/deploy@25579e7]: Trial LFS deployment to ORES canary; T181678 (duration: 02m 06s)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:06:59Z] <awight@tin> Started deploy [ores/deploy@4601497]: Trial LFS deployment to ORES canary; T181678 (take 2)

Mentioned in SAL (#wikimedia-operations) [2018-04-30T20:09:09Z] <awight@tin> Finished deploy [ores/deploy@4601497]: Trial LFS deployment to ORES canary; T181678 (take 2) (duration: 02m 10s)

Gave it another try, with commit rORESDEPLOY4601497c4f43, and got strange results. The LFS data should have been downloaded during "git submodule update --init --recursive", AFAICT, but was not. Then "git lfs pull" should have gotten us the data in case the submodule update didn't work, but that didn't happen either. All commands executed successfully but we were left with only the pointer file and no large data in .git/modules nor checked out in submodules/assets.

@awight: git lfs install needs to be executed on each target and that isn't happening, currently. I can add a hook to scap to do that but maybe it would be better to do it via puppet? I'm not sure what's best since scap is currently stateless with regard to lfs. Adding a call to lfs-install every time scap runs would be sorta wasteful but safe I suppose.

@mmodell I'm reading some strange stuff here, https://github.com/git-lfs/git-lfs/wiki/Installation
Apparently, git lfs install enables LFS globally for a user, which is harmless on dedicated ORES boxes, but unnecessary. We can run git lfs install --local to enable LFS on just the target repo, and can do that immediately after git clone, which should solve the state/stateless question, I think.

Also take a look at --skip-smudge and git lfs pull, it suggests that we don't need to run git lfs pull if we rely on the default "smudge" config.

@awight: I don't think git lfs install --local will take care of the submodules. I suppose I could do git submodule foreach 'git lfs install --local' though

I'm happy with either solution, either a redundant git lfs install or the
submodule foreach. It would be very surprising if the global method had
any impact on non-LFS repos.

{D1039}

Closing, our plan is simple:

Get deployment working with a single LFS file, in the submodules/assets directory.
Once that works, feel free to add LFS anywhere that makes sense. Make a new task for any specific, non-trivial work.

awight closed this task as Resolved.Jun 4 2018, 10:38 PM

awight claimed this task.

awight moved this task from Pending deployment to Completed on the Machine-Learning-Team (Active Tasks) board.

awight closed subtask T180627: Support git-lfs in scap as Resolved.Jun 12 2018, 3:00 PM

hashar added a project: git-lfs.Apr 4 2024, 8:26 AM