Special:Version on Wikimedia wikis shows outdated commit hashes. For example, currently https://en.wikipedia.org/wiki/Special:Version shows 13847cc for the Cite extension, while it should be showing rECIT7aad7f5d0a44. Bryan says he knows why.
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | None | T116345 Special:Version on Wikimedia wikis shows outdated commit hashes for submodules | |||
Resolved | Joe | T87036 Convert work machines (tin, terbium) to Trusty and hhvm usage | |||
Resolved | Joe | T116728 Reimage mw1152 as a terbium replacement | |||
Resolved | Joe | T124024 Be able to switch programmatically between deployment servers in codfw and eqiad |
Event Timeline
Scap generated this gitinfo file:
{ "head": "b00d06a08fc62053e54f99bd234780f2ce0a07a0", "remoteURL": "https://gerrit.wikimedia.org/r/p/mediawiki/extensions/Cite.git", "branch": "b00d06a08fc62053e54f99bd234780f2ce0a07a0", "headCommitDate": "1444676619", "headSHA1": "13847ccede3cd7f7d2160a148b79f85435f6f466", "@directory": "/srv/mediawiki-staging/php-1.27.0-wmf.3/extensions/Cite" }
The key bits here are "head" (sha1 of the HEAD of extensions/Cite on tin at the time) and "headSHA1" (the "disclosable head" of the branch). At the time that scap was last run (2015-10-20T23:20Z), the local git log looked like this:
* b00d06a Display 'cite_error_references_duplicate_key' next to the affected ref (2 weeks ago, Bartosz Dziewoński) * f08984d Creating new wmf/1.27.0-wmf.3 branch (9 days ago, Mukunda Modell) * 13847cc Localisation updates from https://translatewiki.net. (10 days ago, Translation updater bot) (master) * a8fe5e5 Localisation updates from https://translatewiki.net. (11 days ago, Translation updater bot) * a374453 Localisation updates from https://translatewiki.net. (13 days ago, Translation updater bot)
HEAD was rECITb00d06a08fc6 which is the version we would expect to see on Special:Version. Scap instead decided that the "disclosable head" was 13847ccede3cd7f7d2160a148b79f85435f6f466 which happens to be the last commit to master before the wmf/1.27.0-wmf.3 branch was cut.
This "disclosable head" business is our attempt to keep security patches that might be deployed on the WMF production cluster from being displayed on Special:Version. The [[https://phabricator.wikimedia.org/diffusion/MSCA/browse/master/scap/utils.py;bee741b1a477ed734d286acd8d0585fe522b39fd$173|scap.utils.get_disclosable_head]] function tries to find the newest common commit between the clone on the deploy server and the origin repository by running git rev-list -1 @{upstream}.
@{upstream} here is a git magic shortcut for "the currently tracking remote branch for this local branch". It turns out that on our wmf.X branches right now submodule checkouts do not have a remote tracking branch. This causes the git rev-list -1 @{upstream} lookup to fail. When that happens, scap falls back to running git merge-base HEAD $(git remote). This returns the last commit before the submodule's release branch was cut and ignores any commits made on the release branch itself.
So... long story long... "most" of the time this would be close enough. Sync-dir and sync-file don't update the gitinfo cache files (T38271) we would typically expect the disclosed hash to be the branch point of the release branch. I think with some strong git-fu we can do better however and get the remote commits that came after the branch point as well. I'll look into that and report back.
I'm not sure why there aren't remote tracking branches, but it seems like that is the best way to tell what the upstream commit is.
I wonder why the remote branches didn't get set up by multiversion/checkoutMediaWiki?
I think the problem is that submodule tracking branches requires git 1.8.2+ and we have git 1.7.9.5 on tin. The .gitmodules file has the needed data for tracking branches to be used if we had a version of git that supported it:
$ git config -f /srv/mediawiki-staging/php-1.27.0-wmf.3/.gitmodules submodule.extensions/Cite.branch wmf/1.27.0-wmf.3
So... this might all go away when tin is upgraded/replaced with a Ubuntu 14.04 host. We could try to add support for the git config trick above as a fallback when @{upstream} fails. It's a bit kludgy though because we need to know where the .gitmodules file is, which submodule we are looking at, and to prepend origin/ to the branch name we get back:
$ git rev-list -1 origin/$(git config -f /srv/mediawiki-staging/php-1.27.0-wmf.3/.gitmodules submodule.extensions/Cite.branch) 7aad7f5d0a449e5a0b562000bb3f448243edc8dd
The branches are all set in the top level .gitconfig as hoped for. I'm guessing this is because @mmodell is using a not-freaking-ancient version of git when he runs make-wmf-branch.
@bd808: I run make-wmf-branch on my laptop for that reason, and a few others. Tin is ridiculously outdated and it manages to annoy me frequently.
Git on the new Ubuntu 14.04 based deployment servers (tin & mira) is version 1.9.1. I speculated in T116345#1747370 that this might fix the problem. I just checked tin:/srv/mediawiki-staging/php-1.27.0-wmf.13/extensions/Cite and the bug here is not fixed however.
tin:/srv/mediawiki-staging/php-1.27.0-wmf.13/extensions/Cite (git (91eab16...)) bd808$ git log --graph --oneline --decorate --all | head -5 * 91eab16 (HEAD, origin/wmf/1.27.0-wmf.13) VE: Fix i18n names broken during migration * 8276d9a Creating new wmf/1.27.0-wmf.13 branch | * eb58f79 (origin/master, origin/HEAD) VE: Fix i18n names broken during migration | * 483dbf8 Localisation updates from https://translatewiki.net. |/ tin:/srv/mediawiki-staging/php-1.27.0-wmf.13/extensions/Cite (git (91eab16...)) bd808$ git rev-list -1 @{upstream} fatal: HEAD does not point to a branch tin:/srv/mediawiki-staging/php-1.27.0-wmf.13/extensions/Cite (git (91eab16...)) bd808$ git merge-base HEAD $(git remote) c859818a705fc6ee9f3812a319e329ba8f517802
git rev-list -1 @{upstream} is still failing and thus scap will fall back to using git merge-base HEAD $(git remote) which still points to the pre-branch commit on master.
I tried doing the initial clones using git submodule update --init --recursive --remote (see stackoverflow). git remote show -n origin still shows that the tracking branch is master rather than the hoped for wmf/1.27.0-wmf.13. This probably has something to do with a submodule always being in a detached head state and the tracking branch information only being kept in .gitmodules rather than the cloned module itself.
I think I may have found another sort of way to find the disclosable hash using git log. git log --pretty=format:'%H %d' will print the commit hash and the "ref names" that correspond to the commit.
tin:/srv/mediawiki-staging/php-1.27.0-wmf.13/extensions/Cite (git (91eab16...)) bd808$ git log --pretty=format:'%H %d' | head -5 91eab16282a47cedecf1aec0343da7f4917efb69 (HEAD, origin/wmf/1.27.0-wmf.13) 8276d9ae80632a377ad4c42b7357d7d16ab2ea23 c859818a705fc6ee9f3812a319e329ba8f517802 (master) 150f87b4985a6fe9f70790fdf0eb03505f673bb7 71889ff017bec4238a608f51c1fcd1c348232879
You can see here that the rECIT91eab16282a4 commit that we want to show on Special:Version has a ref naming a remote branch. I've tested in a submodule with a security patch applied and the results are something like:
abc123 (HEAD) def456 (origin/wmf/1.27.0-wmf.13) 789abc 012def (master)
Using this output, the first line that includes either origin or master in the ref names should be safe to disclose. This is still a bit hacky feeling but not as much as baking in support for fishing around on the filesystem for the needed .gitmodules file and figuring out the submodule name that using the git config -f trick requires.
Here's the right shell magic to find our hash:
tin:/srv/mediawiki-staging/php-1.27.0-wmf.13/extensions/Cite (git (91eab16...)) bd808$ git log --pretty=format:'%H %d' | grep -E 'origin|master' | awk '{print $1}' | head -1 91eab16282a47cedecf1aec0343da7f4917efb69
Using @{upstream} is still best when it works, but this fallback method could be used instead of git merge-base HEAD $(git remote) when no tracking branch is available.