Page MenuHomePhabricator

Make git play nice
Closed, ResolvedPublic

Description

Figure out how to make separate subtree include both test and includes files in the history if possible.

Also move both of those directories in future to one. We suggested that name could be "Lib/packages/Changes" because "Lib/libraries" seemed confusing and silly.

Event Timeline

I’ve been playing with git subtree and found something that works. Basically, it’s a bunch of git subtree splits (or pushes), wired together using git replace – two git replaces are necessary to connect the history past the changes → Changes directory rename, and one git replace connects the those old histories to the new history where in Wikibase.git the library is now under lib/packages/changes/. git replace refs are not pushed or pulled by default, but can be transferred explicitly, so we would add something to the README advising users to run git fetch origin 'refs/replace/*:refs/replace/*' if they want nice Git history (compare ceylon.formatter instructions).

  1. anywhere: git init /tmp/changes (git subtree push does not initialize the target repo if it doesn’t exist yet, but only tells you this after it went through all the git history work (which takes a few minutes), which is irritating)
  2. in Wikibase: git subtree push -P lib/includes/Changes/ /tmp/changes/ main
  3. in Wikibase: mkdir lib/includes/changes/ (otherwise, git subtree push -P lib/includes/changes/ will refuse to work)
  4. in Wikibase: git subtree push -P lib/includes/changes/ /tmp/changes/ lcase
  5. in changes: GIT_EDITOR='sed -i "/^tree/ a parent $(git rev-parse lcase)"' git replace --edit de3d9cd700 – de3d9cd700 is the root commit of the initially extracted history, where lib/includes/changes/ became lib/includes/Changes/; “edit” it to add the last commit of the second extracted history, for lib/includes/changes/, as its parent
  6. in Wikibase: git subtree push -P lib/tests/phpunit/Changes/ /tmp/changes tests
  7. in Wikibase: mkdir lib/tests/phpunit/changes/
  8. in Wikibase: git subtree push -P lib/tests/phpunit/changes/ /tmp/changes tests-lcase
  9. in changes: GIT_EDITOR='sed -i "/^tree/ a parent $(git rev-parse tests-lcase)"' git replace --edit 2fdc9baef4
  10. in changes: git checkout -b merged "$(git commit-tree "$(printf '040000 tree %s\t%s\n' "$(git rev-parse main^{tree})" src "$(git rev-parse tests^{tree})" tests | git mktree)" -p main -p tests -m 'Merge src and tests histories')" – create a new tree which mounts the main and tests branches as src/ and tests/ directories; create a new commit from that tree; check that commit out as a new branch
  11. in Wikibase: mkdir -p lib/packages/changes/ && git mv lib/includes/Changes/ lib/packages/changes/src/ && git mv lib/tests/phpunit/Changes/ lib/packages/changes/tests/ && git commit -m 'Move Changes files'
  12. in Wikibase: git subtree push -P lib/packages/changes/ /tmp/changes/ moved
  13. in changes: git replace moved merged

And now, any further commits made that affect lib/packages/changes/ (I tested with an “Add README”) can be pushed using git subtree push -P lib/packages/changes/ /tmp/changes/ moved, and the history will do the right thing.

That said, this isn’t a very nice history – because src/ and tests/ were exported separately, there are two copies of most commits, one touching src/ and the other touching tests/. I’ll see if I can find a solution for that – maybe it can be done using git filter-branch instead of git subtree split, “filtering” the working tree by removing all directories except the few that we want to keep. (Doing this via the real git filter-branch promises to be extremely slow, because it would actually check out the full working tree each time and then shell out to /bin/rm to remove it again, but I remember hearing about a utility that should be able to do it faster.)

Edit: looking through the git replace manpage, it seems git replace --graft 2fdc9baef4 tests-lcase would’ve been an easier alternative to step 9. 🤷

Yes, git filter-repo worked very nicely (and finished in less than four seconds, extremely impressive):

  1. in /tmp: git clone wmge:Wikibase.git – clone a fresh Wikibase whose history we can mess with; your clone syntax may vary, I configured wmge: as Wikimedia gerrit extensions
  2. in /tmp/Wikibase: git-filter-repo --path lib/includes/changes/ --path lib/includes/Changes/ --path lib/tests/phpunit/changes/ --path lib/tests/phpunit/Changes/ --path-rename lib/includes/changes:src --path-rename lib/includes/Changes:src --path-rename lib/tests/phpunit/changes:tests --path-rename lib/tests/phpunit/Changes:tests – rewrite the history to keep only the changes directories, and rename them into their new locations
  3. steps 11 and 12 from the other list (but pushing to /tmp/Wikibase instead of /tmp/changes)
  4. in /tmp/Wikibase: git replace moved master

And now, like before, any further changes commits can be pushed with git subtree push, and the history will do the right thing, thanks to the git replace. (At some point, you’ll want to get rid of the current master branch and rename moved to master, because moved is where the history continues.)

(Also, git filter-repo preserves all the other branches – wmf/, REL_, and some random stuff we should probably throw away even in Wikibase? – which we probably want to delete.)

As for the directory name within Wikibase, maybe it should be closer to the packagist package name? I assume that will be something like wikibase/changes, so maybe the directory would be lib/packages/wikibase/changes? (I imagine we might also extract some packages into the wmde namespace instead of wikibase, so we probably don’t want to remove the “wikibase” path component.) Or possibly just packages/wikibase/changes, without lib.

How would git tags work? Would they just be pushed to the split repo?

I don’t think we have any tags in Wikibase.git at the moment. For branches, we could mirror them (git filter-repo preserves them), but I’m not sure if that would be useful… maybe we could keep REL_ branches but discard wmf. branches?

And now, like before, any further changes commits can be pushed with git subtree push, and the history will do the right thing, thanks to the git replace.

I also just realized that we should be able to use git-filter-repo instead of git subtree push for the later updates too, since the commit hashes it produces are also stable. (In that case, we might not even need a git replace.)

It would be great to have some sort of POC. What do you think?

I have some repos locally, should I push them to GitHub? (They’re probably not the final form yet.)

(I’m hesitant to push them because that might trigger notifications for people or issues mentioned in the commit messages :/)

I don’t think we have any tags in Wikibase.git at the moment.

Oh, I meant in the library. When you want to make a new release, you'd push a git tag to the split repo, right? Or do you need to push it to Wikibase and have it get subtree'd over to the split repo?

Oh, I see. Yes, I think we’d push tags directly to the split repo, that should work. (Though I’m not yet sure how often we’ll release the library at all – Wikibase will always use the “master” version, if I understand correctly, not the one from packagist.)

I have some repos locally, should I push them to GitHub? (They’re probably not the final form yet.)
(I’m hesitant to push them because that might trigger notifications for people or issues mentioned in the commit messages :/)

Yeah under your account (and not org like wmde/wikimedia) would not cause notifications (I have done it in github.com/Ladsgroup/tainted-refs

Alright, I pushed that repository to https://github.com/lucaswerkmeister/wikibase-changes. If you clone it, and then also do git fetch origin 'refs/replace/*:refs/replace/*', you should get a full history.

That said, if we’d be using git-filter-repo anyways, I’m now wondering if we need git replace at all? After all, git-filter-repo can do the rename already, so I’m not sure what we need a git replace for… might look into this again tomorrow and create a second demo repository.

I think git-filter-repo would work for now. We can revisit if we still need git replace for whatever reason.

Just one note: I think the files first need to move to somewhere else. Like containing all of them in one place. Similar to this POC: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/593557
Like files being packages/wikibase-changes/src and packages/wikibase-changes/test etc. in Wikibase itself (we can make extension-repo.json and extension-client.json understand the php files easily) and we also can keep the namespace so they stay in the same address but like this:

	"AutoloadNamespaces": {
		"Wikibase\\Lib\\": "lib/includes/",
		"Wikibase\\Lib\\Changes": "packages/wikibase-changes/src/"
	},

With this:

  • The modularity is more explicit in the mono-repo and people wouldn't start reusing everything inside everything.
  • We wouldn't need to do all sorts of black magic with --path-rename lib/includes/Changes:src (we might still need to for the git history part though. Right?)

Oh, I see. Yes, I think we’d push tags directly to the split repo, that should work. (Though I’m not yet sure how often we’ll release the library at all – Wikibase will always use the “master” version, if I understand correctly, not the one from packagist.)

Regardless of what version Wikibase uses, I think for a library to be valuable it needs proper releases...

Change 616117 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/Wikibase@master] Move \Wikibase\Lib\Changes to separate package

https://gerrit.wikimedia.org/r/616117

The git-filter-repo incantation to extract the Changes history is:

git-filter-repo --path=lib/{includes,tests/phpunit}/{c,C}hanges/ --path lib/packages/wikibase/changes/ --path .mailmap --path-rename=lib/includes/{c,C}hanges:src --path-rename=lib/tests/phpunit/{c,C}hanges:tests --path-rename lib/packages/wikibase/changes/: --message-callback 'return re.sub(b"^changes: ", b"", message)'

--path selects the paths to include; --path-rename renames them accordingly; the --message-callback turns commit messages like “changes: Something” into just “Something”, removing the “changes:” prefix. The result can be found at https://github.com/lucaswerkmeister/wikibase-changes/.

Change 616117 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Move \Wikibase\Lib\Changes to separate package

https://gerrit.wikimedia.org/r/616117