Merge to deployed branches instead of cutting a new deployment branch every week.
Closed, DeclinedPublic
Actions

Assigned To

Authored By

	• mmodell
	Feb 19 2015, 9:41 AM

Description

Instead of creating new deployment branches every Tuesday, I'd like to instead maintain 3 longer-lived deployment branches:

To continue the same release cadence that we currently have in place, we would do this instead:

Merge from master into group0 on Tuesday.
Merge from group0 to group1 on Wednesday.
Merge from group1 to group2 on Thursday.

What has to change

Current practices during backports often involve either committing the same change twice (once on master, once on deployment branch) or cherry-picking the commit from master to the deployment branch. Neither of these practices are really practical for long lived branches. Since master will be merged into deployment, we end up with the same change happening twice and that results in a merge conflict. Since cherry-picking introduces the same change but isn't tracked by git, this too causes merge conflicts.

So the solution is for hotfixes to be prepared as follows:

Branch from master -> new topic branch
Make the change and commit
Merge your topic branch to master
During backport window, merge that same topic branch into the deployment branch.

This will ensure clean merges and sane branch history.

Historical rambling follows:

The way we deploy MediaWiki is horribly convoluted, tedious and error prone. The process has grown a lot of cruft over the years and very little has gotten cleaned up. Those who have had to deal with the complexity have been more or less content to live with it / too busy to do anything about it.

As the new guy saddled with the weekly 'train deployment' responsibilities, I am not numb to the problems and I'm not content to continue with a system that is so badly broken.

Problems with the current system

For those who are not fully familiar with the process, see Train_deploys. Even after multiple deploys it's still very difficult to follow without making any mistakes.

Way too many steps - it's time consuming, tedious and error-prone
Steep learning curve, low bus-factor
Very fragile, lots of opportunities to kill entire groups of wikis.
- Missing a step easily leads to breaking production wikis with no warning and no immediate indication that anything went wrong
- Worst case scenario: a single miss-typed command could bring down all wikipedia.org wikis.
- Don't take my word for it, read this: P4469 (IBrokeWikipediaList; original).
Every week we create a full clone of MediaWiki core, plus one for each of the deployed extensions.
- This is slow and wastes a lot of storage/bandwidth.
- Even worse, this stresses gerrit and in turn lowers everyone's productivity by delaying CI test results, slowing developer commits, pulls, code reviews, merges...
We create a new branch on every deployed extension, then proceed to pin them to a specific commit (rather than a branch) via submodules. Branches are not necessary or even appropriate. We are just creating lots of digital garbage that won't ever be collected. Tags would be much more appropriate for marking weekly release milestones.
There is no automated clean-up of old data, so removing old branches and related cached files must be performed manually (yet another error prone and easily overlooked task)
Security patches aren't automatically carried forward from one week to the next, the must be manually applied each time we cut a new branch.

Proposed improvements

Use git-new-workdir instead of cloning the entire remote repo each time we push a new release
Use tags instead of a new branch for each weekly revision. We should only branch for a new 1.x version number, a weekly milestone hardly justifies an entirely new branch.
We need to deal with the multiple versions of mediawiki in production symbolically instead of referring to specific versions.
The deployment process should be one or two commands at most, not a whole series of complex and interdependent commands interrupted by gerrit submissions, rollbacks, +2ing of one's own patches, etc.
security patches that aren't merged in gerrit should be carried forward automatically, no intervention required.

Branching

For a history lesson in how we got to where we are, see this mailing list thread: MediaWiki core deployments starting in April, and how that might work

@RobLa-WMF wrote:

One plan would be to have a "wmf" branch that does not trail far
behind the master. The extensions we deploy to the cluster can be
included as submodules for that given branch. The process for
deployment at that point will be "merge from master" or "update
submodule reference" on the wmf branch. Then on fenari, you will git
pull and git submodule update before scapping like you're currently
used to. The downside of this approach is that there's not an obvious
way to have multiple production branches in play (heterogeneous
deploy). Seems solvable (e.g wmf1, wmf2, etc), but that also seems
messy.

...

Another possible plan would be to have something somewhat closer to
what we have today, with new branches off of trunk for each
deployment, and deployments happening as frequently as weekly.
master
├── 1.20wmf01
├── 1.20wmf02
├── 1.20wmf03
...
├── 1.20wmf11
├── 1.20wmf12
├── REL1_20
├── 1.21wmf01
├── 1.21wmf02
├── 1.21wmf03
...

The conclusion was that it was decided to go with option #2. Honestly I'm not sure this was the right choice. I think we should have 3 "release" branches, instead of constantly making new ones, and use tags for the release pointers.

One branch represents each 'staged' group of wikis, referred to as group 0, 1 and 2 in the current system.

release-staging (Group 0)
release-next (Group 1)
release-stable (Group 2)

Assuming we were to maintain the current release schedule as-is, the process would look something like this:

Tuesday

merge staging -> next
- tag the head of next with a 1.25wmfXX-next release tag
- move group 1 wikis to the new tag

Wednesday

merge: next -> stable
- tag the head of stable with a 1.25wmfN-stable release tag
- move group 2 to the new stable tag
merge: master -> staging
- tag head of -staging with a 1.25wmfXX-staging release tag
- move group 0 wikis to the new tag

Pictures really are worth 1000 words (or at least a few hundred)

This is essentially the same thing as gitflo but using 2 staging branches.

Static Files

Static files are currently served from static-1.25wmfXX versioned directories with varnish in front caching for up to 30 days, which necessitates keeping around several complete copies of the core repo and all extensions, one for each revision that has been deployed within the past 30 days.

These are currently handled somewhat manually, along with the php-1.25wmfXX code checkouts. One elegant solution to this would be to serve the static files directly from a bare git repository. the first path component of the url could represent a git tag with the remaining path matching a file within the repository at the revision matching the requested tag.

mod_git

We could probably implement this using an apache module that calls libgit2. mod_git looks like it would work, with just a few minor modifications:

mod_git uses a cookie to supply the git tag, we would want to use part of the url.
don't serve php files (return 500 errors?)
only allow release tags to be specified? mod_git currently will serve any branch, tag, or specific commit hash.
We would probably need to perform a security audit and possibly some performance optimization.

This would drastically simplify deployment of static files and it would eliminate the need to periodically clean up a bunch of stale static files that get left behind from old deploys.

php-git2

Another option would be to serve the files from git using php-git2 - the libgit2 bindings for php5.3

Related tasks

There are a bunch of tasks related to overhauling the deployment systems and processes. Here are a few relevant links: T97068, T94620, T93428, T95375

Related Objects
Search...

Status	Subtype	Assigned	Task
Declined		None	T49437 Consider a pipeline for enhanced minification (e.g. support UglifyJS)
Resolved		Legoktm	T67289 Use semantic versioning scheme for WMF (all) releases
Resolved		• GWicke	T102550 Use semantic versioning for services (for consistency with mediawiki core)
Resolved		• mmodell	T94620 [EPIC] The future of MediaWiki deployment: Tooling
Open		None	T104398 Deploy MW+Extensions by percentage of users (instead of by domain/wiki)
Resolved		• demon	T73313 Automatically clean up unused wmfXX versions
Declined		• mmodell	T98834 Use subrepos instead of git submodules for deployed MediaWiki extensions
Declined		• mmodell	T89945 Merge to deployed branches instead of cutting a new deployment branch every week.
Invalid		None	T51392 Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched
Resolved		• mmodell	T67306 Adopt Semantic Versioning format for WMF deploy branches beginning with 1.27.0-wmf.1
Resolved		Jdforrester-WMF	T107192 Update ReleaseTaggerBot to deal with SemVer for WMF deployed branches (eg 1.23.0-wmf.6)
Declined		• mmodell	T136015 thoroughly document the new branch cutting plan / strategy
Resolved		• mmodell	T142880 Create `scap swat` command to automate patch merging & testing during a swat deployment
Resolved		• mmodell	T140918 create `scap branch` command (the successor to make-wmf-branch)
Resolved		• mmodell	T142590 make scap3 look in PWD to find local CLI extensions
Resolved		dduvall	T140921 Reduce static asset time on disk from five trains' worth to two
Resolved		Krinkle	T102578 Don't trash cache for front-end resources
Resolved		Krinkle	T99096 Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime
Resolved		• mmodell	T102991 Verify traffic to static resources from past branches does indeed drain
Duplicate		Krinkle	T98087 ResourceLoader module version must only change when effective output would change
Resolved		Krinkle	T104950 Make FileModule version hash deterministic
Resolved		Krinkle	T94810 User modules constantly invalidate their cache timestamp
Resolved	PRODUCTION ERROR	MaxSem	T90411 (4 hrs) ResourceLoader timestamp for mobile.usermodule changes constantly
Resolved		Krinkle	T94074 Refactor ResourceLoader versioning system to use hashes instead of timestamps
Resolved		Krinkle	T111481 Fix intermittend ghost entries in FileModule 'fileHashes' data
Resolved		Krinkle	T113868 File dependency tracking unstable (varies by language)
Resolved		Krinkle	T109394 ResourceLoaderModuleTest::testGetVersionHash is flaky
Resolved		Krinkle	T113092 Revise the design of ResourceLoader's MessageBlobStore

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

ok I updated the title to clarify that this was about mediawiki deployment, and even more specifically, it's more about the process than the tooling (tooling has it's own task).

hashar raised the priority of this task from Medium to High.May 29 2015, 4:27 PM

hashar moved this task from INBOX to In-progress on the Release-Engineering-Team board.

• mmodell changed the status of subtask T98834: Use subrepos instead of git submodules for deployed MediaWiki extensions from Open to Stalled.Jun 15 2015, 6:05 PM

• mmodell lowered the priority of this task from High to Low.Jun 18 2015, 8:07 PM

• mmodell raised the priority of this task from Low to Medium.Jul 6 2015, 6:23 PM

• mmodell moved this task from In-progress to Next: Feature on the Deployments board.

• mmodell added a parent task: T51392: Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched.

• mmodell added a parent task: T102550: Use semantic versioning for services (for consistency with mediawiki core).Jul 6 2015, 6:39 PM

• mmodell mentioned this in T102991: Verify traffic to static resources from past branches does indeed drain.Jul 16 2015, 1:11 AM

• mmodell added a parent task: T104398: Deploy MW+Extensions by percentage of users (instead of by domain/wiki).Jul 16 2015, 9:05 AM

• mmodell renamed this task from Rethinking mediawiki deployment process to Rethinking mediawiki deployment branches and release process.Jul 16 2015, 9:08 AM

• mmodell mentioned this in T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime.

• mmodell added a parent task: T73313: Automatically clean up unused wmfXX versions.Jul 16 2015, 9:12 AM

• mmodell mentioned this in T437: Track state of browser tests before new wmfXX branch cut.Jul 16 2015, 9:16 AM

What is the expected timeline on using long-lived/re-used branches in production?

Please beware that moving to that system must be blocked on a solution for T99096 as otherwise our weekly bugs about static deployments not working will be eternal. @ori and I have a few ideas to solve it, but aren't prioritising it at this point, but we can move it up based on the timeline for this. See furhter at T99096.

@Krinkle: I don't know a specific timeline, but I'd like to hear about your ideas for solving T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime

• mmodell added a subtask: T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime.Jul 16 2015, 4:39 PM

• mmodell removed a subtask: T98834: Use subrepos instead of git submodules for deployed MediaWiki extensions.

• mmodell added a parent task: T98834: Use subrepos instead of git submodules for deployed MediaWiki extensions.

In {T104398#1456723} I elaborate a bit about how the long-lived branches might be managed.

Liudvikas subscribed.Jul 19 2015, 4:12 AM

• mmodell renamed this task from Rethinking mediawiki deployment branches and release process to Merge to deployed branches instead of a new deployment branch every week..Aug 13 2015, 3:44 PM

• mmodell updated the task description. (Show Details)

greg mentioned this in T109519: ForrestBot not working since 2015-08-14?.Aug 19 2015, 5:42 AM

How are we planning to tag commits and when they go into production if we only have one branch? Is the plan to still do this on a weekly all-at-once cycle, or deploy-as-merged-plus-a-window, or what?

In T89945#1553018, @Jdforrester-WMF wrote:

How are we planning to tag commits and when they go into production if we only have one branch? Is the plan to still do this on a weekly all-at-once cycle, or deploy-as-merged-plus-a-window, or what?

There would be 3 branches, wmf/group0, wmf/group1, and wmf/group2

The deployment train process/schedule should look something like this:

Early Tuesday	Tag `HEAD` with `1.26.0-wmf.n` and run integration tests against the resulting tag
Later Tuesday	Merge `1.26.0-wmf.n` `wmf/group0`
Wednesday	Merge `wmf/group0wmf/group1`
Thursday	Merge `wmf/group1wmf/group2`

For swat, ideally a hotfix would look like this:

Branch/Merge	From	To
Branch	`master`	`hotfix_xyz`
Commit
Merge	`hotfix_xyz`	`master`
Merge	`hotfix_xyz`	`wmf/group0`
Merge	`hotfix_xyz`	`wmf/group1`
		...

• mmodell renamed this task from Merge to deployed branches instead of a new deployment branch every week. to Merge to deployed branches instead of cutting a new deployment branch every week..Aug 20 2015, 4:04 AM

• mmodell updated the task description. (Show Details)

This means that we will lose the feature of ~~Forrest~~ReleaseTaggerBot telling us to which versions patches were back-ported.

Worked example:

Production is at 1.26-0-wmf20 for group 0 and 1.26-0-wmf19 for groups1 and …2
Patch is written, reviewed and merged -> goes into master, and will eventually be part of the tag for 1.26-0-wmf21, so ForrestBot tags it as such.
Patch is back-ported to production, but the 1.26-0-wmf20 tag is already locked in without it (unlike a branch).
If you scour Phabricator, it's clear the patch was back-ported, but it's not tagged as such so it doesn't stand out (currently it's really obvious if a patch has multiple release tags that it was back-ported).

Options I can think of:

Accept this loss of functionality
Have sub-versions of tags, so 1.26-0-wmf20 -> 1.26-0-wmf20a -> 1.26-0-wmf20b -> …
Make the SWAT process more heavy-weight, repudiating and updating the main tag
Have ReleaseTaggerBot (or whatever) tag tasks as "in group0", "in group1", "in group2" when they hit those groups, This solves the issue of needing to cross-reference the Server Admin Log to know whether something went to all production, but it's not great.

In T89945#1558101, @Jdforrester-WMF wrote:

Options I can think of:

Accept this loss of functionality

Have sub-versions of tags, so 1.26-0-wmf20 -> 1.26-0-wmf20a -> 1.26-0-wmf20b -> …

Make the SWAT process more heavy-weight, repudiating and updating the main tag

Have ReleaseTaggerBot (or whatever) tag tasks as "in group0", "in group1", "in group2" when they hit those groups, This solves the issue of needing to cross-reference the Server Admin Log to know whether something went to all production, but it's not great.

FWIW, Option #4 makes the most sense to me.

"but it's not great"

What is not-great about it? To me that seems much more straightforward and it conveys the important information, which is, where a change has been deployed and where it hasn't.

I'm also willing to consider #2 but that would result in a lot of tags so I'm not really a huge fan of that idea.

greg edited projects, added Release-Engineering-Epics; removed Release-Engineering-Team.Sep 24 2015, 9:50 PM

JanZerebecki subscribed.Feb 8 2016, 2:56 PM

Krinkle closed subtask T99096: Make Varnish cache for /static/$wmfbranch/ expire when resources change within branch lifetime as Resolved.Feb 26 2016, 1:58 PM

greg edited projects, added Release-Engineering-Team; removed Release-Engineering-Epics.Mar 11 2016, 10:08 PM

I also think option #4 is what we want. It's way better information than what we get now from @ReleaseTaggerBot. Instead of just knowing which release it's in (which you then have to cross reference with something like [[wikitech:Deployments]] or [[mw:MediaWiki_1.27/Roadmap]] you automatically know which group of wikis has it. And after you learn which wikis are in which groups then it's really straightforward.

This is basically the "Train 2.0" idea that we came up with during our annual planning exercise. I've created a wiki page about that project so that I can better plan out what work is needed (and then can figure out which quarter to schedule it in). See: https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Train2.0

This comment is mostly just FYI. I haven't yet started trying to rip info out of this task into that page :)

In T89945#1554422, @mmodell wrote:

For swat, ideally a hotfix would look like this:

Branch/Merge From To

Branch master hotfix_xyz

Commit

Merge hotfix_xyz master

Merge hotfix_xyz wmf/group0

Merge hotfix_xyz wmf/group1

...

This does't account for the fact that merging also brings in all other commits in the shared history.

If you don't want to involve cherry-pick, and have the benefit of native git references (for lookup of whether a commit is in a branch), then it seems the only way to do that is to ensure people rewind their local master back to at least wmf/group1 when creating the hotfix. Since people primarily write patches based on master, this seems impractical.

(sorry wall of text, but I was really curious if we could keep the hotfix commit sha1 across branches. Turns out we can!)

If we wanted to keep the same commit across branches and do a merge, we will first have to find the best common ancestor of all four branches using git merge-base. Build the hotfix against that ancestor then propose four merge patches against each of the branches.

Taking mediawiki/core as it is right now and master / current wmf branches:

# Figure out the common ancestor:
$ git merge-base --octopus origin/master origin/wmf/1.27.0-wmf.{15,16}
f2d8fee03d484dc766d467dc631b5cc4bef1c510

# Create our hotfix based on that ancestor
$ git checkout -b hotfix f2d8fee0
$ git commit --allow-empty -m 'My hotfix'

You get your hotfix tested reviewed.

Then merge it in the various branches:

$ git checkout master && git merge --no-edit --no-ff hotfix
Switched to branch 'master'
Your branch is up-to-date with 'origin/master'.
Already up-to-date!
Merge made by the 'recursive' strategy.

$ git checkout wmf/1.27.0-wmf.15 && git merge --no-edit --no-ff hotfix
Switched to branch 'wmf/1.27.0-wmf.15'
Your branch is up-to-date with 'origin/wmf/1.27.0-wmf.15'.
Already up-to-date!
Merge made by the 'recursive' strategy.

$ git checkout wmf/1.27.0-wmf.16 && git merge --no-edit --no-ff hotfix
Switched to branch 'wmf/1.27.0-wmf.16'
Your branch is up-to-date with 'origin/wmf/1.27.0-wmf.16'.
Already up-to-date!
Merge made by the 'recursive' strategy.

Confirm only your hotfix is merged in, all branches should just be ahead by two commits (the hotfix + the merge commit):

$ git branch --contains hotfix -vv
  hotfix            63b3a3a My hotfix
  master            f72a2a2 [origin/master: ahead 2] Merge branch 'hotfix'
  wmf/1.27.0-wmf.15 d7bfc71 [origin/wmf/1.27.0-wmf.15: ahead 2] Merge branch 'hotfix' into wmf/1.27.0-wmf.15
* wmf/1.27.0-wmf.16 9844463 [origin/wmf/1.27.0-wmf.16: ahead 2] Merge branch 'hotfix' into wmf/1.27.0-wmf.16

Send for review, wait for CI/QA, submit and happy end.

PRO

Definitely doable and the merge commit can be used to fix a conflict if needed. It is mathematically more respectful and give us a nice topology.

git branch --contains hotfix let you find out branches having the fix since the commit has been merged in each branches.

Useful for hotfix branch that have several commits.

BUT

I consider myself fairly fluent in git arcane and basic graph theory, still it took me 1+ hour to write this. The whole trick is really git merge-base and specially its --octopus option.

The checkout / merge --no-ff has to be done locally for each branch then send for review.

Gerrit has the cherry-pick button that "automatize" all the mess (just spam click).

You can still find whether a branch as the hotfix copied to using git-cherry <upstream> <head>.

Food for later:

git-merge lets you use a custom strategy (a script really) that could rely on the git merge-base --octopus option to automatize the mess
we could have a hotfix tool that cherry-pick the patch made on master against the common ancestor, craft the merge commit and propose them to the group[0-2] branches for us

@hashar: You're my hero. This is awesome

• mmodell awarded a token.Mar 12 2016, 1:59 AM

• mmodell removed a parent task: T102550: Use semantic versioning for services (for consistency with mediawiki core).Mar 13 2016, 10:24 AM

greg added a project: releng-201617-q3.Mar 28 2016, 11:42 PM

Krinkle unsubscribed.Mar 29 2016, 12:35 AM

Even worse, this stresses gerrit and in turn lowers everyone's productivity by delaying CI test results, slowing developer commits, pulls, code reviews, merges...

Most of that is not true, except git remote update on a slow link and by a quick estimate even there the branch count we have currently is so small that the changed content dominates the bandwidth use.

But there is quite a bit of content and its history that is only in deployment branches. If splitting that off into another repo is a significant enough save to be worth it would need to be tested. This can be done independently of this ticket. People that want can then use 2 remotes deployment and core in the same clone. However gerrit will loose its ability to show for a core commit which deployment branch it is contained in, as AFAIK that feature is not cross repo. Diffusion might cope in that regard but might have a problem with choosing where to link a commit to that appears in 2 repos.

Anyway if you are concerned about the above the solution in this ticket will prevent us from just moving older wmf deployment branches to a historical repo, i.e. will make it worse.

Security patches aren't automatically carried forward from one week to the next, the must be manually applied each time we cut a new branch.

Is getting such a merge as proposed in this task right easier than 12 cherry-picks? Not running our CI after security patches is already a problem. Not running our CI after a merge and not running it anymore for new deployment branches would make that even worse.

Since master will be merged into deployment, we end up with the same change happening twice and that results in a merge conflict.

This is not necessary, as the git merge strategy "ours" can make a merge that results in the content being that of exactly a specified branch involved in the merge. An example can be seen in rEWBAed7a17415825. The downside of that is that you need to reapply security patches.

I was really curious if we could keep the hotfix commit sha1 across branches. Turns out we can!

Excellent comment! In practice patches will be merged in master before someone knows that it will need to be back ported.

• mmodell mentioned this in T136015: thoroughly document the new branch cutting plan / strategy.May 23 2016, 4:38 PM

greg mentioned this in T136828: Allow to test a mediawiki-config change to the beta cluster.Jun 3 2016, 3:09 PM

greg edited projects, added releng-201617-q1; removed releng-201617-q3.Jun 21 2016, 9:25 PM

greg mentioned this in T131419: Migrate mediawiki-vagrant to Differential.Jul 7 2016, 9:15 PM

Ltrlg subscribed.Jul 9 2016, 7:39 AM

• mmodell raised the priority of this task from Medium to High.Jul 11 2016, 9:50 PM

• mmodell moved this task from In-progress to Long-Lived-Branches on the Release-Engineering-Team board.Jul 13 2016, 4:29 PM

• mmodell edited projects, added Release-Engineering-Team (Long-Lived-Branches); removed Release-Engineering-Team.

• mmodell removed a parent task: T51392: Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched.Jul 18 2016, 6:56 PM

• mmodell added a subtask: T51392: Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched.

• mmodell mentioned this in T51392: Make make-wmf-branch able to branch extensions with replaced substring of the version of mediawiki being branched.Jul 18 2016, 6:58 PM

• mmodell added a subtask: T136015: thoroughly document the new branch cutting plan / strategy.Jul 19 2016, 9:07 PM

• mmodell moved this task from Backlog to In Progress on the Release-Engineering-Team (Long-Lived-Branches) board.Jul 20 2016, 4:11 PM

• mmodell created subtask T140918: create `scap branch` command (the successor to make-wmf-branch).Jul 20 2016, 4:37 PM

Aklapper mentioned this in T139210: Abandon (or at least strongly simplify) project creation policy.Jul 25 2016, 12:28 PM

• mmodell mentioned this in T141278: Decide how ReleaseTaggerBot fits into the brave new world of long-lived-branches.Jul 25 2016, 5:04 PM

Danny_B removed a subscriber: Release-Engineering-Team.Aug 8 2016, 11:41 AM

• mmodell added a subtask: T142880: Create `scap swat` command to automate patch merging & testing during a swat deployment.Aug 12 2016, 10:51 PM

• mmodell removed a subtask: T140918: create `scap branch` command (the successor to make-wmf-branch).

• mmodell added a subtask: T140921: Reduce static asset time on disk from five trains' worth to two.Sep 9 2016, 4:48 PM

Krinkle updated the task description. (Show Details)Nov 18 2016, 3:10 AM

In T89945#2113537, @Krinkle wrote:

In T89945#1554422, @mmodell wrote:

For swat, ideally a hotfix would look like this:

Branch/Merge From To

Branch master hotfix_xyz

Commit

Merge hotfix_xyz master

Merge hotfix_xyz wmf/group0

Merge hotfix_xyz wmf/group1

...

This does't account for the fact that merging also brings in all other commits in the shared history.

If you don't want to involve cherry-pick, and have the benefit of native git references (for lookup of whether a commit is in a branch), then it seems the only way to do that is to ensure people rewind their local master back to at least wmf/group1 when creating the hotfix.

That seems reasonable to me.

Since people primarily write patches based on master, this seems impractical.

It's impractical to prescribe a preferred git workflow and ask committers to submit against the appropriate branch?

In T89945#2807767, @mmodell wrote:

In T89945#2113537, @Krinkle wrote:

Since people primarily write patches based on master, this seems impractical.

It's impractical to prescribe a preferred git workflow and ask committers to submit against the appropriate branch?

Yes. People write patches against master. And from what I read, we don't plan to change this, right? Which means we'd start a fragile trend of using git-merge to apply patches from master to wmf branches. Which will silently bring in unrelated commits - which hard to detect, avoid or verify. Hard to do manually. Even harder (or impossible) within Gerrit or Phabricator.

Unless we start a trend where the only way to land a patch in master is to commit and backport via the oldest current wmf branch. Except if the commit is not meant to go to prod directly. But then again, this decision is not always known ahead of time. I don't see how this plan would eliminate even half of our cherry-picks, let alone all.

I think this model could work well for our release maintenance branches. But I think wmf branches and master are too close to each other for this to be practical.

I emphasise with the current set problems and fragilities, and I agree we should solve them, soon. But we should do so with those problems in mind. Cleanliness of our Git history is not one of those problems and should not be used as main justification. I think we do have a pretty clean Git history today. Being more strict about component prefixes and avoiding merge commits would make it even cleaner. This plan, however, hardly changes the history. We'll still see the same commit messages in the master and wmf branches, with the same number of merge commits. It's just that the commit hashes aren't always the same, and now (sometimes) would be.

Which of the problems is solved by adopting this backport process? The steps for SWAT remain unchanged (except git cherry-pick > git merge). The steps for a new train branch are unchanged and equally complicated, too.

Re-using deployment directories would reduce complexity, but that doesn't require re-using branch names. We could easily checkout the next branch in the same directories each week. This would already solve the problem of large bandwidth use from gerrit to tin and from tin to app servers.

Not having many wmf branches would make the repo cleaner, and allows for git-gc to trash the old cherry-picks. That can be accomplished by removing old branches more frequently in today's model, too.

Downsides:

Two ways to make a backport for SWAT, not one. Depending on if, when, and with what parent, the commit was merged in master – choosing the wrong strategy for a commit would silently roll out master commits to production without beta testing.
Two ways to find whether a change was backported, not one.
Different strategy for MediaWiki release branches as for WMF deployment branches.

Note that Node.js has adopted the same process as we currently have (sans the merge commits). They cherry-pick from master to maintenance branches. I believe this is the defacto Git workflow that new comers expect when they have non-zero Git and open-source experience.

I think we could adopt the traditional "git/git" workflow (git workflow) for MediaWiki releases (e.g. we'd have a maint branch that tracks the oldest release currently supported, always merge important bug fixes there first and then git-merge upwards). However even if we do this for release branches, I'd argue it's still not practical to apply to WMF deployment branches.

Links:

@Krinkle: Thanks for the detailed response. I agree with much of what you've written and I'll have more to say after letting it sink in a bit (and reading the references you've linked to.)

I also want to point out the most recent work related to reducing the number of wmf branches: T147478: Flatten MediaWiki config, all MediaWiki versions, and extensions into a unified git repo which is based on @bd808's idea for syncing deployment branches with git instead of rsync. @thcipriani has built a prototype in {D429} and it looks like the most promising path forward for reducing deployment complexity.

I feel like we are finally starting to get the complexity under control thanks to:

The really great work that you did to eliminate the insanity of static symlinks,
@demon's work to clean up rOMWC Wikimedia - MediaWiki Config
D429
New tooling that I am working on to replace make-wmf-branch

Once all these pieces come together and we hopefully stramline the backports and SWAT processes then I think we'll be in really good shape.

• mmodell closed subtask T142880: Create `scap swat` command to automate patch merging & testing during a swat deployment as Resolved.Dec 21 2016, 3:19 AM