Page MenuHomePhabricator

Create Security Patches board/workflow
Open, MediumPublic

Description

We need a better way of tracking state and progress of security patches...

To write (backlog). Written but unreviewed (or reviewed with problems). Deployed. Released

Event Timeline

A summary of a discussion that happened over email. I hope this
captures the essence of the discussion without omitting or distorting
any salient details.

We had a situation where a security patch got dropped, and it took
some effort to find out what had happened, which is human error. We
only noticed it had happened at all, because there was log messages
that the patch had silenced.

This sparked a discussion of how we could improve the workflow for
handling security patches. The current process is quite manual and
relies one storing individual patch files in a shared directory on a
deployment server. If one of the files is deleted by mistake, it's
difficult to notice, or to find out what happened, afterwards.

As part of that discussion I made the point that we should make all
code that goes into production go via git, to get the tracking
behaviour a version control system provides. Security patches can't go
into a public git repository, usually, but we could have a second,
private repository for those.

The private repository would mirror the public one, but additionally
store security patches in a branch for them. When the public
repository is updated, the changes would be mirrored to the security
repository, and the security branch would be rebased to include the
public changes. The mirroring and rebasing could be automated.

Deployments to public would incorporate the security branch.

An alternative suggestion would be to change the scap tool to apply
patches automatically during branch cut. The patches could be tracked
in a patches-only repository rather than a plain shared directory.

sbassett triaged this task as Medium priority.Dec 6 2019, 4:12 PM
sbassett edited projects, added Security-Team; removed Security-team-backlog.

As part of that discussion I made the point that we should make all
code that goes into production go via git, to get the tracking
behaviour a version control system provides. Security patches can't go
into a public git repository, usually, but we could have a second,
private repository for those.

Nit: we do currently use git for this process, it's just that we use it in a very manual way with a private, on-deployment repo for the patches in /srv/patches (as you mentioned) and by running git apply/am to apply security patches directly on the deployment server. IMO, the main push here is in the way of increased automation and process improvement and the consistency and accountability that would provide.

The private repository would mirror the public one, but additionally
store security patches in a branch for them. When the public
repository is updated, the changes would be mirrored to the security
repository, and the security branch would be rebased to include the
public changes. The mirroring and rebasing could be automated.

IMO, the largest gains for this proposal would be seen from also incorporating a private front-end (likely another gerrit) so as to provide consistency with the current deployment process and also satisfy key requirements around anyone being a able to write and submit a security patch for review and be included in most of the process up to its deploy. And to also make things as seamless as possible with the existing deployment processes. I kind of accidentally conflated all of this in my ops-l response, so I just wanted to clarify that.

An alternative suggestion would be to change the scap tool to apply
patches automatically during branch cut. The patches could be tracked
in a patches-only repository rather than a plain shared directory.

It would be nice to have much of this built into scap though if that would be a blocker to getting started, then it should be deferred. Or obviously abandoned if nobody else thinks it's worthwhile :)

IMO, the largest gains for this proposal would be seen from also incorporating a private front-end (likely another gerrit) so as to provide consistency...

A second, totally separate Gerrit that is just sitting behind Apache auth, kind of like it's just slapped in front of Icinga, without having to trust Gerrit itself to handle private and things and having secret patches in the regular Gerrit seems preferrable to me and not super hard to do.

Nit: we do currently use git for this process, it's just that we use it in a very manual way with a private, on-deployment repo for the patches in /srv/patches (as you mentioned) and by running git apply/am to apply security patches directly on the deployment server. IMO, the main push here is in the way of increased automation and process improvement and the consistency and accountability that would provide.

Hmm. You're right. /srv/patches is a git repo. I had missed that. However, the manual git apply/am dance that the deployer needs to do (as described in https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Apply_security_patches) is, to me, way too manual and error prone.

I'd like to see the whole flow from someone pushing a security patch until there's a deployable tree be entirely automated.

It would be nice to have much of this built into scap though if that would be a blocker to getting started, then it should be deferred. Or obviously abandoned if nobody else thinks it's worthwhile :)

One problem with doing this in scap is that scap is run at deployment time. If we did an automatic rebase whenever the public repo or the security repo change, problems like merge conflicts would be found earlier and there would be more time to fix them.

However, the manual git apply/am dance that the deployer needs to do (as described in https://wikitech.wikimedia.org/wiki/Heterogeneous_deployment/Train_deploys#Apply_security_patches) is, to me, way too manual and error prone.

Agree 100%.

One problem with doing this in scap is that scap is run at deployment time. If we did an automatic rebase whenever the public repo or the security repo change, problems like merge conflicts would be found earlier and there would be more time to fix them.

Auto-rebasing the repos to keep things sync'd sounds sane and should probably happen regardless of any other details. IMO, a command like scap security sync would be very simple to run and more user-friendly to other deployers attempting security deploys. I understand this is a non-trivial wishlist item, but it would be incredibly convenient compared to any other, mostly manual process, even if said process were optimized and simplified from how it exists today.

Auto-rebasing the repos to keep things sync'd sounds sane and should probably happen regardless of any other details. IMO, a command like scap security sync would be very simple to run and more user-friendly to other deployers attempting security deploys. I understand this is a non-trivial wishlist item, but it would be incredibly convenient compared to any other, mostly manual process, even if said process were optimized and simplified from how it exists today.

The ability to rebase a change depends on keeping the change in git. We currently do not. Even though /srv/patches is a git repo it doesn't represent the state of the code in question, it just contains .patch files.

I like the idea of a security sync command but in order to implement such a thing we really need to have a copy of the repositories with patches applied, as proposed by @LarsWirzenius.

We already have a scap patch command to automatically apply patches. That isn't good enough, however, because scap patch often fails and when it does fail it is quite difficult to figure out what's happened. It could be one of the following:

  • The underlying code changed slightly and the patch is now in conflict with that change
    • Even if the underlying change was trivial, the proper resolution to a conflict is not always obvious. Having the deployer resolve such conflicts could result in an incorrect resolution because the deployer doesn't have enough context to understand the change and the security implications thereof.
    • As a historical note, in the past we've had some long-lived security patches that touched translations and/or release notes. When a security patch touched files such as those, which change often, it ended up causing conflicts every week and required deployers to spend a lot of time sorting out conflicting patches with each train. This was made even worse when the patch remained for a long time without getting included in a security release. AFAIK this hasn't been an issue recently but nonetheless it was a pain point for deployers and the current system is especially poor at dealing with the situation.
  • A newer version of the patch merged upstream as part of a security release but was not removed from /srv/patches
    • Sometimes it's difficult to tell the difference between this case and the previous one.
    • Making a mistake here could lead to a security patch being dropped when it should have been rebased.
    • IMO, this is where ~100% of mistakes are likely to be made.

I think the most important point I'd like to make, which may not be clear from my previous comment, is this:

We need the code authors and/or the security team to be the ones resolving any conflicts between security patches and the underlying code. The current tools don't make it easy and deployers don't have enough information to make a good decision even if the tools made it easier. In an ideal world we would be keeping the patches constantly rebased onto the latest code, with conflicts resolved by appropriate people in a timely manner. Then it would be trivial to apply the security patches to production prior to deployment and any conflict would be a deployment blocker which deployers would not even attempt to resolve manually.