Page MenuHomePhabricator

Measure maintenance fraction for Reading Infrastructure
Closed, ResolvedPublic

Description

  • Time period for reporting identified (at least 1 work cycle, like a sprint)

[NA] All resolved tasks in the time period flagged WorkType-Maintenance or WorkType-NewFunctionality (either using both tags, or using one tag and a default)
[NA] Phlogiston Report run (Joel)

Event Timeline

JAufrecht assigned this task to bd808.
JAufrecht raised the priority of this task from to Needs Triage.
JAufrecht updated the task description. (Show Details)
JAufrecht added subscribers: JAufrecht, Aklapper.

I assume this means that you want us to open phab tasks for everything we do as well? /me grumbles

I opened this task because we may pass it back and forth, discuss it, and track it. We have a request from managers to measure maintenance fraction; the extreme form would be all teams measuring maintenance fraction all the time. Where we are at today is running a pilot to ask a range of teams to measure at least a sample period. Other teams have responded in a range from "we're already measuring that" to "we'll measure it for a few weeks" to "we'll retroactively tag a few weeks and count that to produce a one-time number" to "we'll do some undocumented thing and produce a number". As much as possible, I'd like teams to measure a known period in Phab using standard tags, so that we are getting apples-to-apples numbers. If Reading Infrastructure isn't tracking anything in Phab, then that would be a significant additional burden. Do you have some other work list that you could categorize over a representative time period?

Tgr added a subscriber: Tgr.Oct 1 2015, 1:47 AM

I believe most of our maintenance work is on things that have a Phabricator ticket but are not assigned to Reading-Infrastructure or are assigned to multiple teams. So "undocumented thing" seems to be the most plausible approach there.

Also, those categories are somewhat vague. I spent a significant part of today reading and commenting on RfC proposals; is that maintenance or new functionality? (E.g. if it's an RfC about introducing dependency injection to MediaWiki?)

Also, most of the involved phab tickets do not use the sprint extension and so cannot easily be weighted by complexity or time needed. Do you intend to produce a ratio based on the number of tasks? Frankly, that seems useless. T114306 took about 15 minutes. T110283 will probably take two weeks.

bd808 added a comment.Oct 2 2015, 5:39 PM

As much as possible, I'd like teams to measure a known period in Phab using standard tags, so that we are getting apples-to-apples numbers.

This would require apples-to-apples Phab usage which I think is my original point. We aren't doing Scrum, Kanban, Scrumbut, XP, waterfall, or any other described methodology to track and measure the increments of work we undertake. Our work is driven by issues in Phab, Gerrit, GitHub, on-wiki discussions, off-wiki discussions, irc, ...

If I understand what a Phlogiston report is it will show N tasks open at time X, M open at time Y and a graph of some things that happened to them in between. I'm guessing that your tags are going to follow which of those tasks were "new" work and which were "maintenance" and apply an educated guess on the percentage of time spent on each using the concept I've heard that with a large enough sample size the complexity of a given task can be ignored and the law of averages will kick in to make the resulting guesses close enough. Thus if you capture 100 tasks closed in the interval and 37 of them tagged with "WorkType-Maintenance" you will then say that the team being inspected spent 37% of its available work effort on maintenance tasks.

If Reading Infrastructure isn't tracking anything in Phab, then that would be a significant additional burden. Do you have some other work list that you could categorize over a representative time period?

Gerrit commits might come closer for us, but that will still miss all of the time that is spent in investigating and triaging bugs that we don't actually implement a fix for and other activities on production and beta cluster servers to keep services running for internal and external customers.

If you really want to know where time is spent in teams you actually need to do time tracking with a reasonable granularity (say 15min intervals) against a pre-defined set of activity categories for all contributors over an interval deemed large enough to be representative (probably not near a quarter boundary where everything gets a bit wonky with more meetings). Knowing what our current budget situation is, the outcome of that is at best going to be worry about teams that are "in trouble" with no real way to help them and at worst it will inspire witch hunts for "low performers" who are "wasting" Foundation resources.

I would personally be glad to do fine grained tracking and reporting of my time if we can select a tool that has a reasonably low cogitative overhead to use and a classification system that makes it easy for me to bucket my activities in a meaningful way. If I did it for a week or two and found the process to be reasonably non-invasive and the output to be reasonably informative I would be happy to ask my team to participate in the experiment. I probably have a lower aversion to this than most would as I have previously worked in contract/consulting situations that required tracking time spent in 6min increments for months at a time.

I hear your concerns about where this could go. The goals of this phase of the Maintenance Fraction Measuring Project include, in my opinion, validating assumptions about how Maintenance Fraction varies across the Foundations mix of teams (from product-oriented to service- to operational), testing team self-assumptions about where they are on the spectrum, and uncovering any methodological issues that we would need to address to make this more standard and broadly used. I'm hearing that your team doesn't have any single comprehensive itemized list of all work that you do -- no team really does, but your team especially does not, since you get inputs from a broad array of sources, don't write up Phabricator tickets for everything you address, and don't necessarily produce a patches for each thing you investigate. Given all that, I think doing time tracking for a 1-2 week period would certainly be adequate and helpful for our current research purposes. I don't have a tool to recommend; I personally like Toggl and have seen other people using it, but it's not FOSS and it's far from perfect.

bd808 added a comment.Oct 15 2015, 8:44 PM

A non-FLOSS tool is a non-starter, but I think I can probably find a useful tool. What are the time tracking categories that would honestly be useful?

  • "maintenance" (how is that defined?)
  • "new work" (how is that defined?)
  • meeting?
  • email/irc?
  • wmf staff support?
  • community support?

For the pilot, just Maintenance vs New Work. How that is measured is a subject with no consensus at all (see the link and the talk page), so part of the pilot is having different teams try to converge, or at least identify why they can't converge, on a shared definition. So read the discussion and then pick whatever definition you want so long as you document it. The other categories you list could be useful but nobody is asking for them so only collect that if it's trivial to do so or you have some use in mind.

JAufrecht triaged this task as Normal priority.Oct 15 2015, 9:16 PM
JAufrecht set Security to None.

I've setup a server running http://www.kimai.org/ in Labs. I'll play with it this week and see if I can make using it easy enough that the team could use it to track things for a 2 week period to get you a representative sample. We will need to discuss our definition of the terms as well to deliver along with the data.

Could I get an account on it as well to experiment, without messing with
your work?

*--Joel Aufrecht*
Team Practices Group
Wikimedia Foundation

bd808 added a comment.Nov 3 2015, 4:56 PM

Time tracking results for 2015-10-20 thru 2015-11-02:

Participants: 3
Total hours tracked: 223.75
Average hours tracked per participant per week: 37.25

Maintenance vs New Work

maintenance55.01%
new work44.99%

All tracked hours

maintenance47.82%
new work39.11%
meetings13.07%

Results were fairly consistent across the 3 participants with a slightly higher skew on maintenance and meeting time seen for the manager role.

Definitions
meeting: 1:1, team, departmental and other standing and ad hoc meetings that could not be directly attributed to quarterly goals for the team or the individual.

new work: communication, planning, design, coding, code review, support that could be directly associated with a quarterly goal for the team or the individual.

maintenance: Anything that is not meetings or new work. Typical activities included:

  • mailing list participation
  • Phabricator task triage
  • code review for WMF and community submitted contributions
  • technical support activities for WMF staff and community members
  • light system administration
  • design and coding for bugs and small feature requests from WMF and community
  • SWAT and other deployment tasks
bd808 reassigned this task from bd808 to JAufrecht.Nov 3 2015, 4:57 PM
bd808 added a subscriber: bd808.

Awesome. I summarized your results in https://www.mediawiki.org/wiki/Team_Practices_Group/Measuring_Types_of_Work; please check that the summary is accurate.

bd808 added a comment.Nov 3 2015, 5:57 PM

Awesome. I summarized your results in https://www.mediawiki.org/wiki/Team_Practices_Group/Measuring_Types_of_Work; please check that the summary is accurate.

One change made 45%->55% otherwise LGTM

JAufrecht closed this task as Resolved.Nov 5 2015, 10:10 PM
JAufrecht updated the task description. (Show Details)