Page MenuHomePhabricator

Proposal for refactoring of backport dependency handling
Open, Needs TriagePublic

Description

Background

To goal of this task is to act as a central place to track the fixes and changes we want to add to how the backport code manages change dependencies. An important aspect will be to add documentation to the codebase that captures the design and terminology employed - the hope is that that documentation will serve as both a reference and a starting point for any planned changes in the future.

My intention here is to start a conversation with @dancy and @jeena to refine all the items mentioned in the task and agree on a way forward. New subtasks can then be created to tackle the different parts.

Terminology

  • Operator: User running scap backport from the command line
  • Change: A Gerrit change
  • Root change: A change directly targeted for backport by an operator, i.e. passed as an argument to scap backport from the command line
  • Backport repositories: The set of repositories for which it is possible to backport changes. This includes all the MediaWiki code repos (core, extensions and skins) and the MW configuration repo. In other words, the repos checked out in the deployment server under /srv/mediawiki-staging and their respective submodules.
  • Production branch: The master branch of the MW configuration repo or any MW version in the wikiversions.json file in the deployment server. Restricted to backport repos
  • Production change: A change that targets a production branch
  • Deployable branch: A branch currently checked out in the deployment server. Restricted to backport repos. Production branches are a subset
  • Deployable change: A change that targets a deployable branch. Production changes are a subset
  • Dependency: A change targeted directly or transitively by another change's Depends-On clause
  • Sibling dependencies: The set of changes that share the Change-Id specified by a Depends-On clause
  • Relevant dependency: A dependency that has an effect on whether a root change can be backported or not
  • Dependency trail: The graph/tree of all relevant dependencies of a root change

Discrete issues

Allowed root changes

Only deployable changes can be passed as root changes. That is, changes from a backport repository that target a branch currently checked out in the deployment server. In any other case scap should bail out

Non-backport repositories

Changes from non-backport repos cannot be backported. When:

  • Passed as a root change: scap should fail and abort the operation
  • Present as a dependency: scap should emit a confirmation warning prompt with a clear explanation of the risk involved and that scap won't handle the dependency
Relevant dependencies
NOTE: See T365146 for a refined version of the criteria below

Given a root change or relevant dependency D targeting branch br, one of its dependencies K as specified by Depends-On is also relevant in the following cases.

  1. D and K belong to a MW code repo --> K will be relevant if it targets br, or if it targets master but there is no sibling dependency targeting br. This is important to prevent T345304
  2. K belongs to the MW configuration repo --> K will be relevant if it targets branch master, irrespective of what repo D belongs to
  3. D belongs to the MW configuration repo and K belongs to a MW code repo --> K will be relevant if it's a deployable change, or if it targets master but there are no siblings targeting deployable branches. Again, important to prevent T345304

Applying recursively the criteria above produces the dependency trail for a root change. Note that a transition is created when a relevant dependency hits the MW configuration repository. At that point we lose the information about what deployable branch we are tracking and the criterion for what constitues a relevant dependency becomes less strict.

If a change in the dependency trail specifies a Depends-On clause but no relevant dependencies can be determined for it, scap should prompt with a confirmation warning to let the operator know about the anomalous situation.

Backport vote

A relevant dependency votes for or against backporting its root change. If a single relevant dependency in the dependency trail votes against, then the root change cannot be backported.

Again, given a root change or relevant dependency D targeting branch br, its relevant dependency K will vote in the following ways.

  1. D and K belong to a MW code repo --> If K doesn't target master it will vote in favor if, and only if, it's been merged or passed as root change in the command line. If it targets master it will vote in favor if, and only if, it's been merged and its commit is in br (as determined by the "change/_/in" endpoint from Gerrit)
  2. K belongs to the MW configuration repo --> K will vote in favor if, and only if, it's been merged or passed as root change in the command line
  3. D belongs to the MW configuration repo and K belongs to a MW code repo --> If K doesn't target master it will vote in favor if, and only if, it's been merged or passed as root change in the command line. If it targets master it will vote in favor if, and only if, it's been merged and its commit is in all deployable branches (as determined by the "change/_/in" endpoint from Gerrit)
Documentation

All of the above needs to be captured and properly explained in documentation living next to the backport code in the scap repository.

Open questions

  • How much do we want to change/expand our current battery of backport tests?
  • What's the best way to document the design in the code? Is a text README with ASCII graphs enough? Or are there any other interesting tools we can leverage here?

Details

TitleReferenceAuthorSource BranchDest Branch
backport.py: Restore included_in branches checkrepos/releng/scap!270jhuneidiT360291-2master
Customize query in GitLab

Event Timeline

@jnuche Thank you for this beautifully written document!

Suggestions/comments:

  • Please define sibling dependency in the Terminology section.
  • Codewise, I want to have a single function that takes root change numbers as input and generates a machine-readable validation report. This can be used as the main entrypoint for testing, and will have future value for an alternate scap UI.
  • Testing: Must be tested using the regular test script. This will get tested in CI and will be a lot faster than test-scap-backport.

I'm not clear on what the proposed code changes are. I think what is mentioned in the "relevant dependencies" and "backport vote" sections are already implemented in the code (or in the case of change_in, have a MR to do so), except for maybe reducing the number of confirmation prompts. Can you clarify which parts are missing or whether it is that you would like to change the structure of the dependency checking process?

@jnuche Thank you for this beautifully written document!

Thanks!

  • Please define sibling dependency in the Terminology section.

Done

  • Codewise, I want to have a single function that takes root change numbers as input and generates a machine-readable validation report. This can be used as the main entrypoint for testing, and will have future value for an alternate scap UI.

That makes sense to me

  • Testing: Must be tested using the regular test script. This will get tested in CI and will be a lot faster than test-scap-backport.

When you say test-scap-backport you mean the integration tests that currently run in train-dev, right? Having tests that target specifically the backport dependency validation and that are part of the regular tests/CI is also something that I would like to have. And personally think it's worth investing the time in doing it

I'm not clear on what the proposed code changes are. I think what is mentioned in the "relevant dependencies" and "backport vote" sections are already implemented in the code (or in the case of change_in, have a MR to do so), except for maybe reducing the number of confirmation prompts. Can you clarify which parts are missing or whether it is that you would like to change the structure of the dependency checking process?

As a first step I want to make sure that we all agree on terminology and a set of acceptance criteria (do all my points above look correct? did I miss something?)

Then we can look at the current state of the code (including the MR you mentioned) and discuss any required code changes. If we verify that we are already checking the agreed upon A/C (and only those A/C) then that aspect of the changes is complete and doesn't require extra work :)

The other piece of work regarding the code would be a refactor to make the code match the agreed upon terminology. Quite often when we talk about backport issues it's hard to tell what we are referring to and we have to start remembering how things work from scratch. I think that having a design document and a code that matches it would really help us in the future to have a reference and a starting point for discussions.

  • Testing: Must be tested using the regular test script. This will get tested in CI and will be a lot faster than test-scap-backport.

When you say test-scap-backport you mean the integration tests that currently run in train-dev, right?

Yes.

Having tests that target specifically the backport dependency validation and that are part of the regular tests/CI is also something that I would like to have. And personally think it's worth investing the time in doing it

Agreed!

I'm not clear on what the proposed code changes are. I think what is mentioned in the "relevant dependencies" and "backport vote" sections are already implemented in the code (or in the case of change_in, have a MR to do so), except for maybe reducing the number of confirmation prompts. Can you clarify which parts are missing or whether it is that you would like to change the structure of the dependency checking process?

As a first step I want to make sure that we all agree on terminology and a set of acceptance criteria (do all my points above look correct? did I miss something?)

Then we can look at the current state of the code (including the MR you mentioned) and discuss any required code changes. If we verify that we are already checking the agreed upon A/C (and only those A/C) then that aspect of the changes is complete and doesn't require extra work :)

The other piece of work regarding the code would be a refactor to make the code match the agreed upon terminology. Quite often when we talk about backport issues it's hard to tell what we are referring to and we have to start remembering how things work from scratch. I think that having a design document and a code that matches it would really help us in the future to have a reference and a starting point for discussions.

Thanks for the clarification! That sounds good to me.

I'm wondering if we want to change the terminology of 'Production branch' to 'Active branch' to match the terminology that is all over scap (active wikiversions, etc.)?

I'm wondering if we want to change the terminology of 'Production branch' to 'Active branch' to match the terminology that is all over scap (active wikiversions, etc.)?

Personally I like that we can talk of deployable branch/change and production branch/change. It has a symmetry that helps me grasp intuitively the relationships between those things:

- Production branch: The master branch of the MW configuration repo or any MW version in the wikiversions.json file in the deployment server. Restricted to backport repos
- Production change: A change that targets a production branch
- Deployable branch: A branch currently checked out in the deployment server. Restricted to backport repos. Production branches are a subset
- Deployable change: A change that targets a deployable branch. Production changes are a subset

"Active branch" makes sense but "Active change" is not a term we normally use and it's a bit ambiguous. The other option is to use "Active branch" and "Production change", but that breaks the connection between those two terms. Soooo, dunno.