Page MenuHomePhabricator

Define Metrics for Change Failure Percentage
Closed, ResolvedPublic

Description

Provide support to the Release Engineering team for the following KR:
Objective: Culture, Equity and Team Practices
Key Result 1: [...] For all supporting services within this slice developed at the Foundation, including MediaWiki, change failure percentage is reduced by 50% while keeping the deployment frequency steady.

  • Analyze data from all deployment trains since 2016, collected by @thcipriani. Inital code and data available at this GitLab repo.
  • Discuss and prototype different candidate metrics for change failure percentage.

Event Timeline

brennen moved this task from Backlog to Radar on the User-brennen board.
brennen subscribed.

Presented the results yesterday at Release Engineering's lunch and learn:

Todos after the meeting:

  • Rethink the final metric including all signals that cause overhead work for the team: bugs, rollbacks, blockers, with special focus on the ones happening on the third day of deployment.
  • Analyze more in-depth the filenames and their relation with train delays, to potentially come up with a list of "problematic filenames" which can suggest potentially problematic patches.

How are filenames associated with bugs? For example mediawiki/extensions/GrowthExperiments/modules/homepage/addlink/RecommendedLinkToolbarDialog.js is listed as being associated with 9 bugs. I see on https://data.releng.team/train/bug_file?_sort=id&filename__exact=modules%2Fhomepage%2Faddlink%2FRecommendedLinkToolbarDialog.js that there are 5 entries. Four of those entries are a patch to master + a backport to a wmf branch (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695039) ; should those be counted as distinct bugs in the tally? Or should a fix to master plus a backport count as a single bug?

How are filenames associated with bugs? For example mediawiki/extensions/GrowthExperiments/modules/homepage/addlink/RecommendedLinkToolbarDialog.js is listed as being associated with 9 bugs. I see on https://data.releng.team/train/bug_file?_sort=id&filename__exact=modules%2Fhomepage%2Faddlink%2FRecommendedLinkToolbarDialog.js that there are 5 entries. Four of those entries are a patch to master + a backport to a wmf branch (e.g. https://gerrit.wikimedia.org/r/c/mediawiki/extensions/GrowthExperiments/+/695039) ; should those be counted as distinct bugs in the tally? Or should a fix to master plus a backport count as a single bug?

I'm not sure about the discrepancy between the 9 bugs and 5 bugs.

I can answer what a bug is for the purpose of train-stats.

The way I found "bugs" is for each train version:

  1. Find all backports to that version
  2. Check if there is a Bug trailer in the backport
  3. If the phabricator task is a subtype of "bug" or "error" then it's a bug

This algorithm has two problems:

  1. It double-counts backports to multiple branches of the same patch
  2. It under-counts real bugs because: (a) some bugs don't have an associated task (b) some backports fail to mention the task (c) sometimes a bug or error report is not a subtype of "bug" or "error report"

The primary advantage is that it is certain that every backport it counts is a real bug and not a feature or feature flag or some other innocuous deployment. I'd be interested in thoughts about how to programatically fix problem 1

Resolving this as this collaboration has concluded a while ago. Thank you all!