Page MenuHomePhabricator

Ensure submodule updates (for security patches) are committed in the MW directory under /srv/mediawiki-staging
Closed, DeclinedPublic

Description

It seems that over the past year, about 6 in 10 wmf branches are created by someone who does this, and the other 4 by someone who doesn't.

Whenever the wmf branch is created in a way that leaves the submodule changes local and uncommitted, it means that for a full week, the git status output in /srv/mediawiki-staging/php-x.y.z` is "dirty" by default.

This means people that watch this carefully (or that have an enhanced PS1 git prompts that includes this information), have to double-check the verbose version of the status for every action they perform in that directory.

It also means that for people who watch it slightly less carefully, are likely to ignore it due to "banner fatigue" from it being dirty by default that week (this has led to numerous failed deployments due to an extension patch not actually having been deployed).

I've assumed thus far that on weeks when it isn't committed, that it is due to oversight, but a recent IRC chat made me conclude that it's just not something well-documented and newer operators probably never learned it.

Given git status has been dirty for two weeks in a row, filing this as a train blocker. For the next train, please commit the submodule updates. I don't know with what level of automation or manual steps it needs to be done, but please ensure the git status is clean in the new branches going forward so that others can also deploy with ease and confidence.

Thanks :)

Event Timeline

It seems that over the past year, about 6 in 10 wmf branches are created by someone who does this, and the other 4 by someone who doesn't.

Security submodule patches should never be committed in the parent core wmf-branch checkout. It does not say to do this in the documentation, and creates a lot of complications during SWAT.

Does this just need us to update Heterogenous deployment/Train deploys?

It would also need updating on the How to deploy code page; however, we updated to the opposite some time ago:

https://wikitech.wikimedia.org/w/index.php?title=How_to_deploy_code&diff=787554&oldid=787516

We've had discussions about changing the documentation to state that submodule bumps should be committed; however, I'm against doing that. The reason is that this will make SWAT more complicated and error prone for the benefit of having a cleaner PS1 for folks on the server.


Scenario:

  • Merge an update to a submodule on a WMF branch
  • Gerrit automatically bumps mediawiki/core for that wmf branch

This is a fairly common situation for backports; however, it changes the procedure for SWATs if we commit submodules. I created a fairly artificial example to demonstrate the difference:

Here's a core checkout with one submodule named submodule:

core
├── README
└── submodule
    ├── bar
    └── security

security is a file that is created by applying a security patch.

Upstream submodule has added a file named baz and upstream core has bumped submodule creating a commit like:

$ git log -p -1
commit 1e753aced12a54ad6834b6b2bfa6f395394af5eb (HEAD -> master)
Author: Tyler Cipriani <tcipriani@wikimedia.org>
Date:   Mon Jul 29 18:11:06 2019 -0600

    bump submodule

Submodule submodule cfa4494..6d7edc2:
  > Add baz

To SWAT that change currently (that is, without committing the submodule security update):

$ git st
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   submodule (new commits)

Submodules changed but not updated:

* submodule cfa4494...6f4b18d (1):
  > security

no changes added to commit (use "git add" and/or "git commit -a")

$ (master * u=) git fetch
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 2 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (2/2), done.
From /home/thcipriani/tmp/security-patch-fun/core-upstream
   be02c43..ec76766  master     -> origin/master
$ (master * u-1) git rebase
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.
$ (master * u=) git submodule update
First, rewinding head to replay your work on top of it...
Applying: security
Submodule path 'submodule': rebased into '6d7edc25930adefa7002a13eeed0f87cb85b6239'
$ (master * u=) git st
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   submodule (new commits)

Submodules changed but not updated:

* submodule 6d7edc2...e5e860d (1):
  > security

no changes added to commit (use "git add" and/or "git commit -a")
$ (master * u=) tree submodule/
submodule/
├── bar
├── baz
└── security

You can also run git pull instead of git fetch && git rebase or you can run git reset origin/master: any of those things work equally well with the current process.

The new process would be much the same for the first part:

$ (master u+1)  git st   
On branch master
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)

nothing to commit, working tree clean 
$ (master u+1) git fetch
remote: Enumerating objects: 3, done.
remote: Counting objects: 100% (3/3), done. 
remote: Compressing objects: 100% (2/2), done. 
remote: Total 2 (delta 1), reused 0 (delta 0)
Unpacking objects: 100% (2/2), done.
From /home/thcipriani/tmp/security-patch-fun/core-upstream
   be02c43..0aa005b  master     -> origin/master
$ (master u+1-1)

Note that we are now 1 commit head and 1 commit behind master on core ((master u+1-1)) and that commit is affecting the same path (submodule in this case).

Git rebase in this situation leads to:

$ core-checkout (master u+1-1) git rebase
First, rewinding head to replay your work on top of it...
Applying: SECURITY bump submodule
Using index info to reconstruct a base tree...
M       submodule
Falling back to patching base and 3-way merge...
Failed to merge submodule submodule (merge following commits not found)
Auto-merging submodule
CONFLICT (submodule): Merge conflict in submodule
error: Failed to merge in the changes.
Patch failed at 0001 SECURITY bump submodule
hint: Use 'git am --show-current-patch' to see the failed patch

Resolve all conflicts manually, mark them as resolved with
"git add/rm <conflicted_files>", then run "git rebase --continue".
You can instead skip this commit: run "git rebase --skip".
To abort and get back to the state before "git rebase", run "git rebase --abort".

$ (master *+|REBASE 1/1)

Git pull in this situation leads to:

$ git pull
Failed to merge submodule submodule (merge following commits not found)
Auto-merging submodule
CONFLICT (submodule): Merge conflict in submodule
Automatic merge failed; fix conflicts and then commit the result.
$ (master *+|MERGING u+1-1)

The fix is doing git reset origin/master and then re-commiting security updates:

$ core-checkout (master u+1-1) git reset --hard origin/master
HEAD is now at 0aa005b bump submodule
$ core-checkout (master * u=) git st
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   submodule (new commits)

Submodules changed but not updated:

* submodule 6d7edc2...be854ed (2):
  > security
  < Add baz

no changes added to commit (use "git add" and/or "git commit -a")
core-checkout (master * u=)
(/^ヮ^)/*:・゚✧ git submodule update
First, rewinding head to replay your work on top of it...
Applying: security
Submodule path 'submodule': rebased into '6d7edc25930adefa7002a13eeed0f87cb85b6239'
$ (master * u=) git add .
$ (master + u=) gc -m '[SECURITY] bump submodule'
[master ff8f185] [SECURITY] bump submodule
 1 file changed, 1 insertion(+), 1 deletion(-)

The proposed process change adds an extra step for each submodule SWAT (adding back the git commit every time a submodule is updated) and two of the currently available options for updating submodules lead to confusing MERGE/REBASE/detatched head situations. I think this will create more confusion, and hopefully I've laid out the case for that well enough. I'm open to figuring out something better (i.e., a private upstream where security patches are applied), but I think adding commits for security patches in submodules makes deployments harder in the name of cleaner prompts for some folks. Declining this one for the time being -- @Krinkle if I've missed something obvious/you disagree with my above reasoning, feel free to ping me.

I'm familiar with (under default Git settings) that local changes are not automatically rebased upon pulling down potentially conflicting commits from a remote.

As I understand it, under such default Git settings, both with and without a local commit in the parent repo, there would be a to-be-manually-resolved conflict. And that's something we want to avoid. And I agree that that would be something worth rejecting this idea over.

But, as far as I know, with the Git settings we use on deploy1001, such commits are automatically rebased. I don't understand why that same conflict would be resolved automatically by Git if it's local to the submodule checkout, and not resolved automatically if it's also acknowledged in the local parent.

Am I understanding correctly that this is the case? If so, that would be unfortunate, and I'd agree it's undesirable. But that would also seem like a bug in Git, that we should report upstream if it hasn't already.

I'm familiar with (under default Git settings) that local changes are not automatically rebased upon pulling down potentially conflicting commits from a remote.

As I understand it, under such default Git settings, both with and without a local commit in the parent repo, there would be a to-be-manually-resolved conflict. And that's something we want to avoid. And I agree that that would be something worth rejecting this idea over.

But, as far as I know, with the Git settings we use on deploy1001, such commits are automatically rebased. I don't understand why that same conflict would be resolved automatically by Git if it's local to the submodule checkout, and not resolved automatically if it's also acknowledged in the local parent.

Am I understanding correctly that this is the case? If so, that would be unfortunate, and I'd agree it's undesirable. But that would also seem like a bug in Git, that we should report upstream if it hasn't already.

Insofar as settings on deploy1001 are concerned the ones I'm aware of that affect this behavior can be seen in scap prep (https://gerrit.wikimedia.org/g/operations/mediawiki-config/+/master/scap/plugins/prep.py#26).

The main setting that affects this behavior is submodule.[submodule-name].update = rebase -- my contrived example used this setting. Without that setting our current system would drop security patches when we did git pull in the parent.

On deploy1001 we also set branch.autosetuprebase = always for all php-1.xx checkouts; however, I forgot to do that in my contrived example. I retested the same scenario with that setting enabled and it didn't change the behavior of rebase.

Chad wrote scap prep in early 2017, so the current settings affecting checkouts of core have been in place since then at least. Before then we used checkoutMediaWiki and it looks like it used the same config settings: https://gerrit.wikimedia.org/r/#/c/operations/mediawiki-config/+/330254/

I looked at /etc/gitconfig on the server as well and didn't see anything that I thought might affect this behavior (only thing there was something about deploy.required-umask which is probably not even needed anymore would be my guess -- maybe had to do with git deploy).

I am not certain if there's another setting that changes this behavior or not.

To me, this doesn't seem like wholly unexpected behavior from git. If you had a file named submodule (rather than a submodule) and its contents were a a single line of hex digits -- like a sha1 -- and you applied a local change to that file to change the hex digits on that same line and then you pull down an upstream change that also changed the hex digits of that same line it would cause a merge conflict. The behavior I outlined would be what I would expect if the submodule realizes that the sha1 refers to a commit, but the parent sees the submodule contents as just another blob.

(Re-opening as low-priority, non-blocking, task; for updates and visibility.)

It all makes sense...

Thanks, so if I understand correctly, this means even with all current prod settings, committing the submodule update would mean that while the automatic rebase logic within the submodule should be unaffected, the parent repo won't trigger it because it sees conflict with the submodule pointer first and thus refuses to even consider a rebase.

The makes sense from a Git-internal perspective, but is in my opinion the exact opposite of what I expect as a developer. Allow me to draw an analogy to another Git-internal:

  • If the submodule pointer change is not committed, then in general things like "git checkout" would override it blindly. I don't know if "git submodule update" is closer to a fetch/merge/rebase or a checkout, but treating the submodule pointer as a blob, I would mostly perceive it as a "checkout" (replacement) of that file. As such, doing it while dirty I intuitively expect would cause that local change to be lost.
    • Theory and intuition aside, this would normally be true in Git if it weren't for our settings. If I have a git repo and a submodule locally, and checkout a random different commit in that submodule (thus making the parent, dirty) and then run "git submodule update" in the parent, it replaces the pointer with the latest one. It wouldn't rebase. It would be lost.
  • If the submodule pointer change is committed, then in pulling down updates from Gerrit remote, the merge/rebase logic is triggered which would naturally have a place in Git to resolve that. And indeed, this is somewhat true, in that it results in a conflict (which is better than silent loss). It's just that git apparently doesn't have a setting (yet) to rebase that in the same way it supports within a submodule.
Going forward

So with the above, under normal circumstances, if the submodule change is committed, a conflicting change results in a merge conflict to resolve (which is better than silent loss). And if it isn't committed, it is normally silently lost.

This is why one should be extremely careful when applying submodule changes whilst git status is dirty. It's not something I care personally for to be clean. Rather, it's been an indicator time and time again of something actually having gone wrong. E.g. a deployment that was aborted and forgotten about (parent fetch, but stale submodule), or someone applying a patch directly to an extension submodule that may've not merged in Gerrit as submodule update for the parent repo etc. These cannot be easily distinguished from a security patch.

This means right now the workflow I use for deployment is, in addition to all the normal steps:

  • (in MW dir) git fetch, git log, git rebase...
  • (in MW dir) git status, cd extensions/Foo; git status (in Ext dir); git log (in Ext dir) - decide what's the reason, does it look outdated, stale, different branch or head point entirely, or just a security patch on top of plain remote.
  • git submodule update.

Should we document this for SWAT and recommend for others? Or should we just ignore these issues and assume it won't cause problems? Or some other workflow?

(Re-opening as low-priority, non-blocking, task; for updates and visibility.)

It all makes sense...

Thanks, so if I understand correctly, this means even with all current prod settings, committing the submodule update would mean that while the automatic rebase logic within the submodule should be unaffected, the parent repo won't trigger it because it sees conflict with the submodule pointer first and thus refuses to even consider a rebase.

The makes sense from a Git-internal perspective, but is in my opinion the exact opposite of what I expect as a developer. Allow me to draw an analogy to another Git-internal:

  • If the submodule pointer change is not committed, then in general things like "git checkout" would override it blindly. I don't know if "git submodule update" is closer to a fetch/merge/rebase or a checkout, but treating the submodule pointer as a blob, I would mostly perceive it as a "checkout" (replacement) of that file. As such, doing it while dirty I intuitively expect would cause that local change to be lost.
    • Theory and intuition aside, this would normally be true in Git if it weren't for our settings. If I have a git repo and a submodule locally, and checkout a random different commit in that submodule (thus making the parent, dirty) and then run "git submodule update" in the parent, it replaces the pointer with the latest one. It wouldn't rebase. It would be lost.

Yep, checkout is the default behavior for submodule update.

  • If the submodule pointer change is committed, then in pulling down updates from Gerrit remote, the merge/rebase logic is triggered which would naturally have a place in Git to resolve that. And indeed, this is somewhat true, in that it results in a conflict (which is better than silent loss). It's just that git apparently doesn't have a setting (yet) to rebase that in the same way it supports within a submodule.
Going forward

So with the above, under normal circumstances, if the submodule change is committed, a conflicting change results in a merge conflict to resolve (which is better than silent loss). And if it isn't committed, it is normally silently lost.

Yes indeed.

This is why one should be extremely careful when applying submodule changes whilst git status is dirty. It's not something I care personally for to be clean. Rather, it's been an indicator time and time again of something actually having gone wrong. E.g. a deployment that was aborted and forgotten about (parent fetch, but stale submodule), or someone applying a patch directly to an extension submodule that may've not merged in Gerrit as submodule update for the parent repo etc. These cannot be easily distinguished from a security patch.

This means right now the workflow I use for deployment is, in addition to all the normal steps:

  • (in MW dir) git fetch, git log, git rebase...
  • (in MW dir) git status, cd extensions/Foo; git status (in Ext dir); git log (in Ext dir) - decide what's the reason, does it look outdated, stale, different branch or head point entirely, or just a security patch on top of plain remote.
  • git submodule update.

On thing I recommend for SWAT deployers (https://wikitech.wikimedia.org/wiki/SWAT_deploys/Deployers#git) is having status.submodulesummary set to true.

That way, from the top-level core checkout you can see what's been fetched down, and why your submodule is dirty.

That is, you see something like this for security patches (usually you can tell from the commit message):

$ (master * u-1) git status
On branch master
Your branch is behind 'origin/master' by 1 commit, and can be fast-forwarded.
  (use "git pull" to update your local branch)

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   submodule (new commits)

Submodules changed but not updated:

* submodule cfa4494...f992a06 (1):
  > security

And then when you rebase an upstream commit you see:

$ (master * u=) git rebase
First, rewinding head to replay your work on top of it...
Fast-forwarded master to refs/remotes/origin/master.
$ (master * u=) git status
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   submodule (new commits)

Submodules changed but not updated:

* submodule 6d7edc2...f992a06 (2):
  > security
  < Add baz

no changes added to commit (use "git add" and/or "git commit -a")

So the change is rebased, but the submodule isn't updated -- it's still one commit ahead (>security) and one commit behind (< Add baz) of the upstream submodule.

A submodule update gets you back to "normal"

$ (master * u=) git submodule update submodule/
First, rewinding head to replay your work on top of it...
Applying: security
Submodule path 'submodule': rebased into '6d7edc25930adefa7002a13eeed0f87cb85b6239'
$ (master * u=) git st
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

        modified:   submodule (new commits)

Submodules changed but not updated:

* submodule 6d7edc2...6ccb144 (1):
  > security

no changes added to commit (use "git add" and/or "git commit -a")

So with status.submodulesummary = true my workflow for mw-config and php branches is always:

  • git fetch
  • git log -p HEAD..@{u}
  • git rebase
  • git status (to see if there's a submodule update needed)
  • git submodule update (if needed)

Should we document this for SWAT and recommend for others? Or should we just ignore these issues and assume it won't cause problems? Or some other workflow?

Pardon the confusion, but are we now wanting to do submodule updates for security deploys? There are currently two for php-1.34.0-wmf.16:

	modified:   extensions/CheckUser (new commits)
	modified:   extensions/MobileFrontend (new commits)

Submodules changed but not updated:

* extensions/CheckUser 0bb5c31...4f11a8e (1):
  > SECURITY: xxx

* extensions/MobileFrontend 94d7a8a...504efdb (1):
  > SECURITY: yyy

So the best practice would be to run a git submodule update for both of these now? Never mind, as clarified on ops IRC channel and on ops-l on 2020-03-12.

Declining since there's no action to take. Documentation on the wiki is correct and IIRC points to this ticket for details :)