Define Metrics for Change Failure Percentage
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Miriam
	Aug 24 2021, 11:28 AM

Description

Provide support to the Release Engineering team for the following KR:
Objective: Culture, Equity and Team Practices
Key Result 1: [...] For all supporting services within this slice developed at the Foundation, including MediaWiki, change failure percentage is reduced by 50% while keeping the deployment frequency steady.

Analyze data from all deployment trains since 2016, collected by @thcipriani. Inital code and data available at this GitLab repo.
Discuss and prototype different candidate metrics for change failure percentage.

Event Timeline

Miriam created this task.Aug 24 2021, 11:28 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 24 2021, 11:28 AM

thcipriani awarded a token.Aug 24 2021, 1:38 PM

thcipriani edited projects, added Release-Engineering-Team (Radar); removed Release-Engineering-Team.

brennen added a project: User-brennen.Aug 25 2021, 5:39 PM

brennen moved this task from Backlog to Radar on the User-brennen board.

brennen subscribed.

• CMacholan subscribed.Sep 7 2021, 4:50 PM

Presented the results yesterday at Release Engineering's lunch and learn:

Slides: https://docs.google.com/presentation/d/12K7k0LkgxK8ovPM0zLaK7Q9az7jZca0tAV91uwSIib8/edit#slide=id.p
Filenames that cause blockers spreadsheet: https://docs.google.com/spreadsheets/d/15xO-y0SopWvpjmztwvrvZf9z2V8NJ3gtufkCzjDApjE/edit#gid=1646264321
Notebook with more insights: https://github.com/mirrys/release-engineering-data/blob/main/metrics_reng_data.ipynb

Todos after the meeting:

Rethink the final metric including all signals that cause overhead work for the team: bugs, rollbacks, blockers, with special focus on the ones happening on the third day of deployment.
Analyze more in-depth the filenames and their relation with train delays, to potentially come up with a list of "problematic filenames" which can suggest potentially problematic patches.

Miriam moved this task from FY2021-22-Research-July-Sept to FY2021-22-Research-Oct-Dec on the Research board.Oct 15 2021, 2:47 PM

Miriam edited projects, added Research (FY2021-22-Research-Oct-Dec); removed Research (FY2021-22-Research-July-Sept).

How are filenames associated with bugs? For example mediawiki/extensions/GrowthExperiments/modules/homepage/addlink/RecommendedLinkToolbarDialog.js is listed as being associated with 9 bugs. I see on https://data.releng.team/train/bug_file?_sort=id&filename__exact=modules%2Fhomepage%2Faddlink%2FRecommendedLinkToolbarDialog.js that there are 5 entries. Four of those entries are a patch to master + a backport to a wmf branch (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695039) ; should those be counted as distinct bugs in the tally? Or should a fix to master plus a backport count as a single bug?

In T289567#7714296, @kostajh wrote:

How are filenames associated with bugs? For example mediawiki/extensions/GrowthExperiments/modules/homepage/addlink/RecommendedLinkToolbarDialog.js is listed as being associated with 9 bugs. I see on https://data.releng.team/train/bug_file?_sort=id&filename__exact=modules%2Fhomepage%2Faddlink%2FRecommendedLinkToolbarDialog.js that there are 5 entries. Four of those entries are a patch to master + a backport to a wmf branch (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695039) ; should those be counted as distinct bugs in the tally? Or should a fix to master plus a backport count as a single bug?

I'm not sure about the discrepancy between the 9 bugs and 5 bugs.

I can answer what a bug is for the purpose of train-stats.

The way I found "bugs" is for each train version:

Find all backports to that version
Check if there is a Bug trailer in the backport
If the phabricator task is a subtype of "bug" or "error" then it's a bug

This algorithm has two problems:

It double-counts backports to multiple branches of the same patch
It under-counts real bugs because: (a) some bugs don't have an associated task (b) some backports fail to mention the task (c) sometimes a bug or error report is not a subtype of "bug" or "error report"

The primary advantage is that it is certain that every backport it counts is a real bug and not a feature or feature flag or some other innocuous deployment. I'd be interested in thoughts about how to programatically fix problem 1

leila moved this task from FY2021-22-Research-Oct-Dec to FY2021-22-Research-April-June on the Research board.Apr 8 2022, 2:34 AM

leila edited projects, added Research (FY2021-22-Research-April-June); removed Research (FY2021-22-Research-Oct-Dec).

leila moved this task from FY2021-22-Research-April-June to In Progress on the Research board.Aug 26 2022, 7:27 PM

leila edited projects, added Research; removed Research (FY2021-22-Research-April-June).

Resolving this as this collaboration has concluded a while ago. Thank you all!

Define Metrics for Change Failure PercentageClosed, ResolvedPublicActions

Description

Event Timeline

Define Metrics for Change Failure Percentage
Closed, ResolvedPublic
Actions