Page MenuHomePhabricator

Work out why initial skin/extension branches into REL1_35/REL1_36 were on the wrong branch
Closed, ResolvedPublic

Description

While done some safety checks on the 1.36.0 release, I noticed the patch from 1.36.0-rc.0 was rather large:

-rw-rw-r--  1 reedy reedy  1011137 May 27 20:53 mediawiki-1.36.0.patch.gz
$ wc -l mediawiki-1.36.0.patch
131694 mediawiki-1.36.0.patch

Sure, Timeless had been rebranched, but everything else hadn't, but there were a load of commits being brought in to both Skins and Extensions on their first submodule bump

e.g.
https://github.com/wikimedia/mediawiki/commit/ace61df6c9c16d9f60c47ca738679d4445d508a9#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e
https://github.com/wikimedia/mediawiki/commit/24ec2d544f5de0d0867df1b0ba12cf3490c0d159#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e
https://github.com/wikimedia/mediawiki/commit/1fbfae290bd7c78ce171ed6bd52de83e663a8bbc#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e

It would seem the commits are *right* now, but they weren't before... Why weren't they?

For example, MediaWiki-extensions-Interwiki was at rEIWA8121d9a4a784: Localisation updates from https://translatewiki.net. (May 2020) before, now it's at rEIWA38ddf24041d8: build: Updating browserslist to 4.16.6 (from yesterday).

May 2020 sounds MW-1.35-release terroritory...

rEIWA8121d9a4a784: Localisation updates from https://translatewiki.net. is in the REL1_35 branch.

Looking at the REL1_35 release branch in MW core... OATHAuth did something similarly "odd" in https://github.com/wikimedia/mediawiki/commit/37ef236f0c882426dd362e147779c992426d6b9d#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e and ConfirmEdit in https://github.com/wikimedia/mediawiki/commit/8afdcfa3a341068b29f247246630f86acb60c8fc#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e and LocalisationUpdate in https://github.com/wikimedia/mediawiki/commit/9da1df824eab900961c96fa331dbc8f5acf45578#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e etc

Event Timeline

It seems https://releases.wikimedia.org/mediawiki/1.35/mediawiki-1.35.0.patch.gz is similarly large... Though only 3781 lines long, vs 131694 for 1.36.0.patch

It seems the problem is in rMW301645397d2d: Add REL1_36 sub-modules when the intial submodules are added for the release branch

Reedy renamed this task from Work out why initial skin/extension branches into REL1_36 were on the wrong branch to Work out why initial skin/extension branches into REL1_35/REL1_36 were on the wrong branch.May 27 2021, 9:54 PM
Reedy added a subscriber: Jdforrester-WMF.

It looks like LibUp may have saved us for the .0 release (ie it's touched all extensions and skins, causing bumps so they're on the right versions)...

The patch for 0.rc0 -> .0 will be huge, but not much we can do about that now.

But we definitely want to get this fixed for 1.37

Quick guess seems it was fine for REL1_34, based on https://github.com/wikimedia/mediawiki/commits/REL1_34/extensions, but not for REL1_35 and REL1_36

Umherirrender added subscribers: Legoktm, Umherirrender.

LibUp was never run for REL1_34

This can also impact the next release for REL1_35 when patchs included for the tarball extensions with all the package bumps.

On the mediawiki libs files like this build files (composer.json, package.json, package-lock.json, .*) are excluded on release/git-export.

LibUp was never run for REL1_34

It's not just libup. It's any backports that trigger the "fixing" of the extension/skin to the correct version - see https://github.com/wikimedia/mediawiki/commit/37ef236f0c882426dd362e147779c992426d6b9d#diff-20ef7739e8423f114899b767909a220c5dc4169d73a38196d7a2dbfac5166c4e for example

LibUp isn't to blame, if anything, it's helping fix the problem for bundled extensions/skins. But the problem shouldn't be there in the first place.

This can also impact the next release for REL1_35 when patchs included for the tarball extensions with all the package bumps.

It has already "fixed" 1.35.

hashar added a subscriber: hashar.

For Interwiki on the Gerrit server:

git -C /srv/gerrit/git/mediawiki/extensions/Interwiki.git reflog --all
|grep refs/heads/REL1
a97dff06 refs/heads/REL1_31@{0}: merged
718e96a3 refs/heads/REL1_31@{1}: merged
d709857c refs/heads/REL1_31@{2}: merged
57893d4e refs/heads/REL1_31@{3}: merged
9a18f46e refs/heads/REL1_34@{0}: created via REST from wmf/1.34.0-wmf.25
8ec5cb6d refs/heads/REL1_33@{0}: created via REST from HEAD
c4206a5b refs/heads/REL1_32@{0}: created via REST from master
b78d2d1d refs/heads/REL1_31@{4}: merged: fast forward
84b9a1bc refs/heads/REL1_31@{5}: created via REST from master
83d3638b refs/heads/REL1_30@{0}: created via REST from HEAD
0d00eef2 refs/heads/REL1_29@{0}: created via REST from HEAD
fad61ce8 refs/heads/REL1_23@{0}: push: fast forward
45be134f refs/heads/REL1_28@{0}: created via REST from HEAD
7ee2b27e refs/heads/REL1_27@{0}: push: created
6ed9dc3a refs/heads/REL1_26@{0}: merged: fast forward
0d338254 refs/heads/REL1_26@{1}: push: created
c6a625b7 refs/heads/REL1_25@{0}: merged: fast forward
4c2cba23 refs/heads/REL1_25@{1}: merged: fast forward
50f184d4 refs/heads/REL1_25@{2}: merged: fast forward
8b036b1a refs/heads/REL1_25@{3}: merged: fast forward
9dfb02e2 refs/heads/REL1_25@{4}: merged: fast forward
ce07020a refs/heads/REL1_25@{5}: push: created
2699c3f8 refs/heads/REL1_24@{0}: push: created
f9ed98e6 refs/heads/REL1_23@{1}: merged: fast forward
921190fa refs/heads/REL1_23@{2}: push: created
b3ba7e6f refs/heads/REL1_22@{0}: push: created

So REL1_34 has been created via the REST API, presumably using some script. OATHAuth has a similar output.

I have no idea how the REL1_35 / REL1_36 branches got created. Presumably by pushing changes made locally and using an incorrect starting point?

So REL1_34 has been created via the REST API, presumably using some script. OATHAuth has a similar output.

I have no idea how the REL1_35 / REL1_36 branches got created.

The branch.py script was changed between the REL1_34 and REL1_35 branches.

Presumably by pushing changes made locally and using an incorrect starting point?

No, just the same script as before.

Summary REL1_35 submodules have been registered from old commits and AFTER lot of changes have made to the extensions REL1_35 branches. The "new" commits thus never got took in account until some maintenance bot kicked in and bumped the branch.

Looks like the concern are the commits with the summary Update git submodules which update extensions/skins in the mediawiki/core release branch. Those commits are crafted automatically by Gerrit when submodule repository received an update.

The list of commits have all been made on between May 12th and 14th by the LibraryUpgrader bot.

If I pick https://github.com/wikimedia/mediawiki/commit/f241a350e74bcb07b71869478002ddea05f0ca76 f241a350e74bcb07b71869478002ddea05f0ca76 it is for the PageImages extensions in the REL1_35 branch. The submodule bumps:

ee6dbda (HEAD -> REL1_35, origin/REL1_35) build: Updating npm dependencies
9a2f95e (origin/wmf/1.35.0-wmf.41) Revert "Localisation updates from https://translatewiki.net."
c54a823 Localisation updates from https://translatewiki.net.
e1584be build: Updating lodash to 4.17.19
b4f1051 Thumbnail is too hard to type

It is from this Gerrit change https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/689762 Which has for parent https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/612314 from July 13th 2020 17:48 which was done against the master branch.

PageImages got added as a submodule along with the rest of extensions on July 14th 2020 at 8:04 via https://gerrit.wikimedia.org/r/c/mediawiki/core/+/612464 / b1dd34ac2dec4f5dc9caf0c43a5134b9139c604d / T256376 and it shows:

.gitmodules
+[submodule "extensions/PageImages"]
+       path = extensions/PageImages
+       url = https://gerrit.wikimedia.org/r/mediawiki/extensions/PageImages
+       branch = REL1_35
+++ b/extensions/PageImages
@@ -0,0 +1 @@
+Subproject commit 20909a1c3c26465c2999235d6b37970a045a1126

That is https://gerrit.wikimedia.org/r/c/mediawiki/extensions/PageImages/+/609064 from July 2nd 2020.

My understanding is that when the REL1_35 branch got cut the submodules have been registered from some old commits (from July 2nd), possibly cause preparation steps have been done ahead of time. The timeline would be:

  • Branch get cut
  • Some changes are made to PageImages REL1_35 branches such as e1584be build: Updating lodash to 4.17.19 from July 10th
  • PageImages is registered as a submodule with a commit from July 2nd (and thfus that lodash update is NOT included)
  • No other change happen until May 2021 when LibraryUpgrader update some npm dependencies ee6dbda8ce6
  • Gerrit superproject tracking kicks in and the PageImages submodules is updated/fastforwarded to the newish commit

I would guess the fix is:

  • Get in mediawiki/core REL1_35 branch
  • Update all submodules using the remote tracked repository instead of the registered commit: git submodule update --init --remote (see man page for --remote).
  • Check the result. If nothing show up, I guess it is just that the LibraryUpgrader ended up touching all those extensions/skins recently and triggered the proper submodule update
  • Possibly do the same to mediawiki/extensions and mediawiki/skins

Ideally we need to automatize the release/branch cutting to ensure we don't have the issue again.

I would guess the fix is:

  • Get in mediawiki/core REL1_35 branch
  • Update all submodules using the remote tracked repository instead of the registered commit: git submodule update --init --remote (see man page for --remote).
  • Check the result. If nothing show up, I guess it is just that the LibraryUpgrader ended up touching all those extensions/skins recently and triggered the proper submodule update
  • Possibly do the same to mediawiki/extensions and mediawiki/skins
➜  mediawiki-core-for-releasing git:(master) git checkout REL1_35
Updating files: 100% (5996/5996), done.
Branch 'REL1_35' set up to track remote branch 'REL1_35' from 'origin'.
Switched to a new branch 'REL1_35'
➜  mediawiki-core-for-releasing git:(REL1_35) git submodule update --init --remote
[…]
➜  mediawiki-core-for-releasing git:(REL1_35) git status
On branch REL1_35
Your branch is up to date with 'origin/REL1_35'.

nothing to commit, working tree clean
➜  mediawiki-core-for-releasing git:(REL1_35) git checkout REL1_36
Updating files: 100% (5501/5501), done.
[…]
Branch 'REL1_36' set up to track remote branch 'REL1_36' from 'origin'.
Switched to a new branch 'REL1_36'
➜  mediawiki-core-for-releasing git:(REL1_36) ✗ git submodule update --init --remote
[…]
➜  mediawiki-core-for-releasing git:(REL1_36) git status
On branch REL1_36
Your branch is up to date with 'origin/REL1_36'.

nothing to commit, working tree clean
➜  mediawiki-core-for-releasing git:(REL1_36)

That's reassuring.

Yes and it seems that has the l10n bot and/or LibraryUpdater bots ran on repositories, that triggered Gerrit to update of the submodules.

What is left to do is to review our documentation has to how we cut the branch / register the submodules to ensure we are in sync for the next releases. I don't know where the documentation is, if one points me to it, I will be happy to write down the fix I have proposed at T283883#7176721. And I guess we can then claim this task resolved.

Yes and it seems that has the l10n bot and/or LibraryUpdater bots ran on repositories, that triggered Gerrit to update of the submodules.

What is left to do is to review our documentation has to how we cut the branch / register the submodules to ensure we are in sync for the next releases. I don't know where the documentation is, if one points me to it, I will be happy to write down the fix I have proposed at T283883#7176721. And I guess we can then claim this task resolved.

The commands we followed were as set out in the description of T279456 but the sub-module commit was manual because the branch script broke, per T279456#6985695; that bug was T279718: Branch script doesn't respect `branchpoint` when creating submodules, resulting in "implicit merges" error from gerrit I think.

For what it's worth, the branches for REL1_37 were created properly AFAICT.

For what it's worth, the branches for REL1_37 were created properly AFAICT.

Yeah, we (I) intentionally ran the branch script as quickly as possible after the TrainBranchBot for 1.37.0-wmf.23, to avoid there being drift. I think we've "worked out" why this happened before; the functionality to branch at a particular point in time has gone away with the re-write and we need it back, which is T279718.