Better code review processes for reviewers
Closed, ResolvedPublic

Tokens
"Barnstar" token, awarded by Qgil."Like" token, awarded by Tgr."Cookie" token, awarded by Miriya52."Love" token, awarded by daniel."Like" token, awarded by Volker_E."Orange Medal" token, awarded by Krinkle."Like" token, awarded by Bmueller."Like" token, awarded by Aklapper."Like" token, awarded by Glorian_Yapinus."Like" token, awarded by Johan.
Assigned To
Authored By
Aklapper, Oct 31 2016

Description

Type of activity: Pre-scheduled session (or Unconference session if not accepted)
Main topic: How to grow our technical community / How to manage our technical debt
Etherpad: https://etherpad.wikimedia.org/p/devsummit17-code-review-processes

The problem

Whether and when code review will happen is not predictable which can be demotivating to someone who contributed a code change proposal.
Our backlog of unreviewed/unmerged patches grows.

Concentrating on those performing reviews, potential problems might be:

  • Lack of time (or CR not being prioritized by managers)
  • Lack of maintainership / 'responsibility feeling'
  • Lack of skill or confidence to perform {effective / good} CR
  • Lack of interest in doing CR

Expected outcome

Discuss some of these items with an organizational and social focus (and mostly ignore technical aspects of CR tools) and how to improve the situation.
Exchange and document best practices, and discuss how to foster them in other development teams.

Current status of the discussion

tbd

Links


Quick copy & paste of Etherpad content:

Day & Time: Jan 10, 13:10

Room: Ventana

Facilitator(s): andre, kevin

Note-Taker(s): greg-g, qgil

Purpose: move towards agreement on improvements, approaches, way forward to most important aspects

Chronology:

(Andre and Kevin go through the slides.)

  • Benefits of code review:
    • improve quality
    • share knowledge
  • problem:
    • <insert slide of open changesets per month, going up>
    • time: unpredictable when a patch will be reviewed
    • acceptance: unpredictable what is being reviewed/looked for
    • amount: unpredictable what will be reviewed
    • Aspects:
      • Social, technical, organizational
        • Let's get the technical aspects (Gerrit vs Differential, etc) aside; let's focus on social aspects.
    • Roles
      • Author, Reviewer
  • Past
    • we've done a lot of events, had a lot of conversations, tried to improve this generally through many means
    • Problems with lack of maintainers, identified maintainers, idea of a code review committee...

Problems, breaking down. (full slide)

  • Resourcing
  • Skill/Knowledge
  • Culture
    • some teams better than others
    • do you look for things to review from others
  • Docs/Communication

Discussion topics:

  • collectively decide (define, agree on) what the most improtant problems are
  • collectively decide which action(s) to take for each problem
  • plan how to implement a course of action for each problem

Questions/Comments:

TimS: code review is a critical point where communicating is very important, with outside contributors or inside the organization. Be careful that you raise the minimum of quality issues, respect. Be nice to your colleagues. Reviewers tend to provide criticisms then never come back (to see the result of what the author has done to address them). Be clear about the steps missing to get the patch merged.

Aleksey: Issues about minor changes needing ot be made. Style issues also blocking the speed of review/merge.

James F: Context as a product owner. When looking at a patch for the first time it's the first time I've even known this was an issue to be addressed, so feedback can vary from small tweaks to "this isn't something we even want to do", and if contributors went through some up-front process we might avoid the latter. I find how radically different teams/repos are as being unhelpful given that I write patches against a few dozen repos. Remembering that if you don't respond to WIP changes within a month in one repo it gets abandonded but not in anothe repo

Matanya: You never know know are going to review code and when. I go to IRC and ask, but there is no official way to know who to poke and when. This doesn't scale, especially not a good experience for newcomers.

Chad: reiterate Tim's comments on being nice and nitpicking. Code review is a demotivating process when you get negative feedback. not nitpicking is super important. Code style can be automated, and should be automated. it saves everyone time and it removes the negative feeling because it is not a human who is disappointed but a machine doing a lint

Andre summarizes:

  • Being friendly
  • Not starting with nitpicking
  • Predictable approach
  • Architectural things first
  • Developers should contact the "responsible team"beforehand [for bigger changes]
  • Automatization of style etc.

Things we could do?

  • improve documentation? there's not much for authors
  • ... (missed this one)

Brian Wolff: start small before taking over the world. See what works in a small number of repos, see if it works, then expand.

Quim: My first manager at WMF used to say a patch is a gift. How you review the patch tells a lot about you. Sometimes it can represent actually money by way of time. The sheer number of hours that are in that backlog of unreviewed patches. we're throwing money away by not processing those patches.

Aleksey: re nitpicking, not even about -1, but even about just leaving a comment that something isn't correct. ignore it if it really doesn't matter. Does anyone have a code review checklist in your team? No? :(

Chase: I'm fine with nitpicking. I come from place where code review was a battle. Difference between "this won't work/this will break" and "I'd do this differently". Patches aren't always a gift. If it's about something that I don't care about I'm not going to get up 20 minutes earlier to properly review it. If I +1 a patch I am also putting my name there. Sharing responsibility with a patch. If you review my patch you are buying into this work together. Avoiding nitpicking is not a helpful narrative, but if you are asking someone +1 their code you are asking them to share responsibility.

Brian: part of the issue with nitpicking is that there can be a long time between updates on a patch back and forth so the delay over a minor thing is extending the time to merge

Tim: Sometimes we see serial nitpicking. Someone gets nitpicking, a new patch fixes, and then more nitpicking comes, from the same person or another.

Andre: more structured is wanted, less nitpicking (replacing with automated tools where appropriate). Start small with a few different projects.

Which teams want to see this problem and try to solve it?

Aleksey: checklist: we have one, I try to follow it.

Stephen: how code review works on Android. It's unusual in ways. We'er a smaller team. Don't get piles of volunteer contributions. The rest of the patches are from us paid devs. I think it is fair to say that our team we have two hats. One hat when reviewing other's code and another when we review code from a new comer. Professional developers should know conventions, and we have a social relationship. New contributor, we need to be more condescendent. Comments on a newcomer patch, they tend to be extremely enthusiastic; someone went through the process to submit it.
Different views of nitpicking: I like to get out this type of info out of reviews so I can learn "how professionals do it". i tried to make sure it is phrased in a positive way. What is a nitpicky comment? :)

Kevin: Tone is really important, nitpicky or not.

Kevin: Idea of a committee. Would people in this room participate?

Greg: What would the charter be?

Kevin: looking at these issues in turn (eg: checklists, getting agreements to review within a certain amount of time)

Quim: before discussing who would be interested, do people think this would be a solution to the problem? I think so, but maybe others not? The problem is vague. we all know our house is not so clean. What is every big team gets one person when there is a patch to review they are responsible for triaging. The success story from Android, we need more like that (stories of sucesses). Re committee: i think it will help.

Mark H: To add to that, what is the purpose of the cmmttee: enforcement or standards setting? We'd need to give them some power to slap people on the wrist if they don't follow the CoC or nitpick inappropriately. Some standardization and then the cmmtte to ensure that everyone is doing it.

Brian Woff: on metrics, if we had very good per team metrics, it would be good if they would then be posted in quarterly reviews.

Kevin: metrics can be tricky (greg: to not game them). How to make sure we don't have this coversation a year from now? Does it come from management? Hoping for a bit more of a community solution. It wouldn't just be the foundation trying to solve this problem but also the vol contribs who feel the pain. we aren't going to solve this in an hour once per year. A committee could be the way to do that.

Andre: takeaway: if we had a committee we'd have to sort out mandate, standards vs enforcement

Mark H: if volunteers how do they have any power? no budget, no ability to effect quarterly reviews

Kevin: we'd have to have Foundation participation as well. My hope would be that the committee would propose things than then would become policy, or focus peer pressure "if these teams are doing code review well, why can't other teams do the same?"

Mark H: James F, I have a good way of interacting with him. Received good feedback. From him and his team. I would suggest that people should look at his team. They have a good start. Build on that.

Seb35: I am wondering in the code review if it will be better if we can classify different trypes of code review. A specialist who could mark new patches with tags to have different types of patches and get easier reviewers for those types.

Andre: per team metrics, how teams use Gerrit? How they make sure that all the repositories that the team thinks they maintain are actually watched for new patches?

JamesF: There are 2 things we use. We create a custom dashboard in Gerrit for VisualEditor team. Gerrit allows complex queries for dashboards. [lengthy description, URLs / screenshots would be nice]. I look at that page regularly and make sure that patches is reviewed. I also subscribe to many .... and get about 1000 notifications every week, whcih might not work for others. I also have a low tech solution... [missed it].

Andre: within the foundation, is the knowledge of these custom dashboards widespread?

James: <insert description of how> yes, we should make it more widely known/used. All this doesn't help if your code happens to be in GitHub, or whatever. [There was a brief mention to Differential, the bright vague future].

Niklas L: we had a large problem of unreviewed patches (from volunteers), we tried 2 things: 1) separate tool that gets lists of open patches in our repos and says whether it needs review, who, and age. this helps to sort out which patches need review the most. Every week it is sent to the team list. 2) wrote a code review statement of intent including which repos we look at and how quickly re try to react to patches (greg: like an SLA). ... we also abandon patches after three months (?)

Andre: how do you make sure it happens (the SLA)?

Niklas: the weekly email and I hope it happens :)

Tim: what we can do to have more reviews and reduce backlog, what the committee (ArchCom?) could do. Brion and I used to review all MediaWiki codes in subversion.... How could you get someone like me to like to review. It's not about saying how this aligns to the Foundation goals. (((missed bits)))

Andre: action items?

Quim: background, the TCB/DevRel team around July we talked to Greg, Wes, the natural neighbors in Prod & Tech. we've been trying to drive it, we can organize events, raise awareness, do metrics, that's it. It was acknowledged. we can't do it anymore, it needs to be someone else. we're all still busy. Both wes and victoria and the rest of the managers are very receptive to do something. I want to propose something, whatever it is, something specific, go to management and wikitech-l, do it. SOmething that can be discussed and agreed within 3 weeks as a good enough change. Then we revist later and see if we have something better to show for ourselve

Greg: the committee idea was presented last year. It needs to be people with interest but also with... respect, the clout to make things to happen. We have this problem in the ArchCom, not approving RfCs but to get things changed at the WMF. But this is more specific. We migt have a chance to make a proposal, send it to wikitech-l, and get a committee based on the community respect, withiout having to rely on Vicotira/Wes. And start small, and keep going.

Chase: What if 2-3 months from now in one of the monthly metrics meetings or something like that, someone would present the problem, this is wht we think we should do, but it is not happening. Defining the problem is a reasonable step.

MarkH: committee that helps distribute grants, they have me have actual authority to look at grant apps and say "this is good" "this needs work", much like code review. We accept or reject. in the same way maybe there's a way to apply that model to the code review process. we want some people to come in and review this and if they review the code it's accepted. how to applymodels we already know work 9at least better than othing)

Quim: hmmm, I went blank :)

Greg: let's do the comittee to define problem, then what?

Kevin: depending on what the committee finds, maybe wes/victoria or archcommittee, or something else

Brian: probably need to convince committers on wikitech-l

Quim: ...

Action items:

  • Set up a working group (which might form a committee)
  • Post slide deck and link to it from these notes

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
Qgil added a subscriber: Qgil.Nov 9 2016, 3:29 PM

@Aklapper are you volunteering to drive this session? If so, please assign this task to yourself.

Aklapper claimed this task.Nov 10 2016, 2:23 PM

I could but would love to co-run this session with someone.
@ksmith: You're in?

@Aklapper : I can definitely help with this, yes.

Siznax added a subscriber: Siznax.Nov 15 2016, 7:30 PM
Jonas added a subscriber: Jonas.Nov 16 2016, 8:22 AM
Johan awarded a token.Nov 17 2016, 7:21 AM
Aklapper updated the task description. (Show Details)Nov 21 2016, 11:42 AM
Aklapper triaged this task as Low priority.
Bmueller added a subscriber: Bmueller.
Joe added a subscriber: Joe.Nov 21 2016, 4:31 PM
Tgr added a subscriber: Tgr.Nov 21 2016, 8:57 PM

I think what I said in T114419#1723852 still holds: it would be nice to go into this with actual data instead of vague personal impressions.

Volker_E added a subscriber: Volker_E.
Qgil added a comment.Nov 23 2016, 5:40 PM

We have been trying to deal with this problem as a whole for year, and we haven't got... anywhere? (not fair, but the metrics about code review waiting times are not improving). Here is an idea to break this loop: focus on patches submitted by new developers and try to reach excellence.

The reasons are "economical". If a new contributor uploading their patch number 1-5 has to wait weeks for a review, chances are that we will lose that person. It is fair to assume that new contributors will lack the contacts and community knowledge to get through this hop. Meanwhile, the more veteran contributors will be just as frustrated, but they will have more chances to reach to reviewers or find the channels to complain. They will have a higher personal investment in Wikimedia, so it will be more difficult to lose them for one patch stuck or two.

Trying to reach excellence for code review of patches from new contributors would not be easy, but perfectly doable. To reach that state, we would improve the situation for the rest, and we would learn something practical on the way. Reached that goal, we could discuss a new, more ambitious goal.

Nemo_bis added a subscriber: Nemo_bis.EditedNov 23 2016, 7:32 PM

Are we able to identify whether in some meaningful subset of our repositories the situation got better and, if so, try to export any best practice?

If not, we might also need to acknowledge that this issue will never go away unless someone with extra money available (read: WMF) assigns some real resources to the matter, e.g. an experienced MediaWiki core developer to go through the backlog of unmerged patches across the mediawiki/* repositories.

Lucie added a subscriber: Lucie.Nov 25 2016, 3:47 PM
daniel added a subscriber: daniel.Nov 26 2016, 6:43 PM

One idea I have been thinking about:

One cause of the problem that the number of active developers with merge rights and the expertise to actually review changes to a specific component is relatively low, and such active developers are often staff, payed primarily for tasks other than doing code review.

One way to improve the situation is to spread the load: People with merge (+2) rights should mostly review things that already have a +1 (perhaps we can filter the gerrit dashbaord to support this). People who have +1 rights should be encouraged to do code review, to gain credits/karma. One incentive could be that patches by people who do a lot of reviews are themselves preferred for review by others. I don't think gerrit can help with that, but perhaps we could somewhere have a board with "patches by active reviews pending active review", or something similarly poetic.

The idea is to formalize, automate, and scale a social mechanism that is already at work: when I see a patch by someone who's name I recognize, I'm more likely to review their patch. If they do a lot of reviews, I'm more likely to know their name. Maybe I even remember that they do good reviews, and am even more inclined to look at the patch they made.

People who have +1 rights should be encouraged to do code review, to gain credits/karma.

I have heard that not everyone agrees on the meaning of "+1". That some people view a patch that has been +1'd as being worse than a patch with no reviews at all. So there might be a prerequisite of adjusting some social norms.

One incentive could be that patches by people who do a lot of reviews are themselves preferred for review by others.

That sounds awesome to me.

Thanks for the comment!

Qgil added a comment.EditedNov 30 2016, 9:33 AM

I have heard that not everyone agrees on the meaning of "+1". That some people view a patch that has been +1'd as being worse than a patch with no reviews at all. So there might be a prerequisite of adjusting some social norms.

I wonder how +1 can be worse than no reviews!? For what is worth, it is explained at https://www.mediawiki.org/wiki/Gerrit/Code_review#Giving_positive_feedback

Qgil added a comment.Dec 2 2016, 9:43 AM

At the last Summit we had a similar pre-scheduled session in the main room: T114419: Event on "Make code review not suck". What has changed during this year? Any improvements, regressions? How can we assure that we don't repeat the same type of discussion, ending up in the same place? What should we do differently next year to finally fix this important problem?

ksmith added a comment.Dec 6 2016, 1:12 AM

Great question, @Qgil. I just went back and watched the video of that session. Since then:

  • There have been experiments with "code review office hours", but at least so far, they have been very lightly attended.
  • There was some hope that Differential might help with some of the issues around signaling (e.g. +1/-1), but obviously we haven't switched to it yet (and won't do so in the near future).
  • I think I remember a push to fill in the code ownership tables more completely. But I don't remember seeing mention of new guidelines or policies about what it means to be an owner, and especially what responsibilities it entails.
  • There was talk of a small committee to work on the issue, but as far as I know, that hasn't happened. Some individuals have stepped up (such as with the office hours), but there hasn't been an organized effort.
  • The mw.o documentation about "how to do code review" has been improved substantially.
  • Andre has been sending out email requests for reviews of specific patches

What could we accomplish at this upcoming session? Some ideas:

  • Figure out what a committee might look like, and start to draft members
  • Check on the status of the owners/maintainers list, and try to figure out whether it has helped
  • Find out whether training in CR would be valuable, and if so, come up with a plan to offer it
  • Discuss how CR would be different in Differential vs. Gerrit
  • Try to figure out why CR office hours haven't been more successful
Qgil added a comment.Dec 6 2016, 9:19 AM

Thank you @ksmith! This is a very good summary.

Some ideas for more strategic or longer term topics:

  • Agree on the relevance of this problem for WMF Technology, Product, Community Engagement. Benchmark: should this be a problem tackled in the WMF Annual Plan FY2017-18?
  • Identify who owns this problem. Until recently, Community Engagement has been driving but we have said that this is how far we can be at the driver seat. @Aklapper finished the documentation and started a routine to watch patches waiting for reviewers. Someone in Technology or Product (or Team Practices?) needs to take the torch if we want to move forward.
  • Define a reasonable mid term scenario i.e. what can we achieve in 12-18 months?

I guess I am simply proposing to discuss the inclusion to this problem and the steps leading to a desired solution in the WMF Annual Plan 2017-18.

Are we able to identify whether in some meaningful subset of our repositories the situation got better and, if so, try to export any best practice?

Not yet. Once https://wikimedia.biterg.io/ isn't in beta state anymore I'd love to take a look at that again.

One way to improve the situation is to spread the load: People with merge (+2) rights should mostly review things that already have a +1 (perhaps we can filter the gerrit dashbaord to support this). People who have +1 rights should be encouraged to do code review, to gain credits/karma.

With regard to "getting things merged" instead of "getting things reviewed" this covers a very valid point. We don't only have stuck patches not receiving any reviews, but also stuck patches that have received reviews ("+1" in Gerrit language; with random interpretations of its meaning) and noone merging ("+2" in Gerrit language; arc land in Phabricator).

I don't want to spend time to discuss the meaning of +1 in Gerrit. Phabricator Differential requires adding a text comment when reviewing, instead of a meaningless "+1" without info if the reviewer tested the patch or just likes the idea.

Thank you for the summary, @ksmith!
Adding some links with more background information.

  • There have been experiments with "code review office hours", but at least so far, they have been very lightly attended.

https://www.mediawiki.org/wiki/Code_Review/Office_Hours
http://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-codereview/
T128371: Set up Code Review office hours

  • There was some hope that Differential might help with some of the issues around signaling (e.g. +1/-1), but obviously we haven't switched to it yet (and won't do so in the near future).

https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/Project/Differential_Migration

  • I think I remember a push to fill in the code ownership tables more completely. But I don't remember seeing mention of new guidelines or policies about what it means to be an owner, and especially what responsibilities it entails.

T128370: Update ownership list on [[mw:Developers/Maintainers]]
"What does it mean to be a maintainer" is a good question. I'd guess we expect code reviews from maintainers in some reasonable time frame.

  • There was talk of a small committee to work on the issue, but as far as I know, that hasn't happened. Some individuals have stepped up (such as with the office hours), but there hasn't been an organized effort.

T129842: Launch the Wikimedia "Code Review Special Interest Group".
T115852: Fix unclear maintenance responsibilities for some parts of MediaWiki core repository feels related and a WMF issue.

  • The mw.o documentation about "how to do code review" has been improved substantially.

T129067: Document a structured, standardized approach for reviewing code contributions; T129068: Improve code contribution guidelines for patch authors

  • Andre has been sending out email requests for reviews of specific patches

Last example: https://lists.wikimedia.org/pipermail/wikitech-l/2016-December/087114.html

In addition, we defined T102920: What to do with old open patches for unmaintained/inactive repositories when not even the uploader responds plus improved Phabricator Differential/Owners docs in T128372: Document use of Owners in Phabricator and advertise it.

mmodell added a subscriber: mmodell.Dec 7 2016, 9:39 PM

On the Differential front:

Differential has been seeing more use recently and we have reasonable prototype jenkins testing infrastructure for repositories with simple testing needs.

Qgil added a comment.Dec 20 2016, 10:55 AM

This session is planned to be scheduled on Tuesday, Jan 10 at 1:30 - 2:40pm (70 minutes) in the main room. This session will be video-streamed. Do you have any objection with this slot?

This is a very nice spot to gather attention and (hopefully) attract active participants. We need to prepare this session well! How can I help?

The time slot seems fine. Should we try to converge on topics/format here, or in email, or in a hangout?

Tgr awarded a token.Dec 23 2016, 12:40 AM
Aklapper added a comment.EditedJan 3 2017, 10:44 AM

We have 70 minutes.

I'd love to spend the initial max. 10min for introduction slides which

  1. define the problem (statistics on both time and amount; issue categories from blogpost; ∃ social vs technical vs organizational aspects),

  1. what we have tried since last year (see T114419 + ml),
  2. maybe what we have found out,
  3. what the goal(s) of this session is.

I want to concentrate on social and organizational aspects, instead of technical Gerrit vs Differential implementation details or the meaning of Gerrit's "+1" (+2's solve our problem; +1's do not and mean nothing).

After that, if we go for T149639#2760937, we need to "propose" and define topics for each group.

In the end, have potential improvements, approaches, solutions to the most important aspects, and defined who is responsible ("owner") to follow up after DevSummit that these get followed / implemented (whether that's WMF or not).

I'd like to highlight two previous comments again for everyone following this task and/or planning to attend the session:

What could we accomplish at this upcoming session? Some ideas:

  • Figure out what a committee might look like, and start to draft members
  • Check on the status of the owners/maintainers list, and try to figure out whether it has helped
  • Find out whether training in CR would be valuable, and if so, come up with a plan to offer it
  • Discuss how CR would be different in Differential vs. Gerrit
  • Try to figure out why CR office hours haven't been more successful

Some ideas for more strategic or longer term topics:

  • Agree on the relevance of this problem for WMF Technology, Product, Community Engagement. Benchmark: should this be a problem tackled in the WMF Annual Plan FY2017-18?
  • Identify who owns this problem. Until recently, Community Engagement has been driving but we have said that this is how far we can be at the driver seat. @Aklapper finished the documentation and started a routine to watch patches waiting for reviewers. Someone in Technology or Product (or Team Practices?) needs to take the torch if we want to move forward.
  • Define a reasonable mid term scenario i.e. what can we achieve in 12-18 months?
ksmith added a comment.Jan 4 2017, 6:46 PM

Maybe we can start at the highest level: Is the goal of this session to:

  • Raise awareness?
  • Collectively decide (define, agree on) what the problem(s) is/are?
  • Review the current situation and/or past efforts?
  • Brainstorm ideas?
  • Collectively decide which action(s) to take?
  • Collectively decide how to decide which action(s) to take (aka form a committee)?
  • Plan how to implement a course of action?
  • Have a cathartic but aimless conversation?
  • (something else)?

I'll note that decisions a) are often difficult in a short session, and b) would exclude those not present. That probably also applies to "plan how to..."

Thanks! Due to time constraints I'd pick these ones:

  • Collectively decide (define, agree on) what the most important problems are
  • Collectively decide which action(s) to take for each problem
  • Plan how to implement a course of action for each problem
ksmith added a comment.Jan 4 2017, 9:34 PM

@Aklapper: Ok. cool. I'll be surprised if we get through more than 1.5 of those in a single session.

Do we want to come with a draft/proposed/tentative list of problems and/or actions, or build the list(s) from scratch during the session?

Aklapper raised the priority of this task from Low to High.Jan 4 2017, 10:09 PM
cscott added a subscriber: cscott.Jan 5 2017, 12:13 AM

It would be great if we could come up with a metric which could be used to determine if we are in fact making progress or backsliding.

Some of the proposed metrics are misleading, perhaps because gerrit only maintains limited metadata. For example, some folks use gerrit as a repository for rough sketches of patches or work in progress. Some patches are commented on but never revised. It's often mentioned that we should pay more attention to contributions from first-time or infrequent contributors, but that information isn't explicitly marked. Gerrit's interface makes "abandoned" patches disappear, so folks are reluctant to apply that label if there's any chance they want to dust off a patch later.

To the extent that gerrit is used as a code repository -- a namespaced branch system, where various forms of work-in-progress can be kept -- it is entirely expected that the "backlog" of unmerged patches in gerrit grows over time.

Can we come up with a "conservative" metric, that we are sure eliminates all the possible confounding factors (commits by staff, work in progress, patches contributed by community members who then disappear, etc)? If so, we can realistically set a goal to drive this number to zero and keep it there.

Analytics-Tech-community-metrics is where such work on statistics happens (for instance T39463: Key performance indicator: Gerrit review queue), so you should read those discussions and make specific proposals there. The specific matter you mention has been addressed by excluding changesets whose message contains "WIP". The other big factor skewing statistics is/was the significant number of self-merges, cf. T59038#641249.

ksmith added a comment.Jan 5 2017, 6:17 PM

@cscott : Thanks for the ideas on metrics. I'm wary of trying to boil a complex story down into a single number. It would be hard to pick one that is meaningful. Perhaps some small set of numbers would be better.

Even so, there are aspects of code reviews that can't (I think) be captured in a number: Do first-time submitters get the feedback they need? Are people +2'ing things they shouldn't, or failing to +2 things they should? Are people using code reviews to help avoid increasing tech debt?

cscott added a comment.EditedJan 5 2017, 10:24 PM

@ksmith Right. At the same time, I am wary about drawing the wrong conclusions: I think removing just "WIP" as @Nemo_bis mentions is not sufficient to weed out all the other uses of "gerrit as code repository", so I am not surprised that such metrics grow over time (just as it is not surprising that our codebase grows over time). A small set of metrics might indeed be nice. For example, a "lower bound", which aggressively removed staff commits, WIP, anything labelled with a branch, etc; as well as an "upper bound" closer to our current statistics. Then the gap between our "lower bound" and our "upper bound" provides an indication of our uncertainty, so we aren't misled into believing in the precision of a single number and chasing it too far.

Some of the things you don't think can be captured in a number do in fact seem like they can be quantified. For example, "do first time submitters get the timely feedback they need" -- well, you can certainly quantify "timeliness" of feedback, and you can also look to see if the submitter manages to submit a revised patch after feedback/have the patch merged (or be explicitly abandoned by the submitter), which would be a strong indication that the submitter got actionable feedback.

"Are the +2'ing stuff they shouldn't" -- can be captured by looking at reverts of merged patches. My intuition is that this is rare, but we should have a watchdog metric in case it ticks up. Per-user stats which are off-trend for the community as a whole might prompt users to be more cautious.

"Are they not +2'ing stuff they should" -- this can be quantified with timeliness. For a patch which was eventually committed (and not later reverted), how long did it sit in its "final" form before being eventually merged? What's the time delay between the first "review action" taken on the final form of the patch (a comment left, or a +1) and the eventual +2? Again, probably some of this stuff is healthy -- folks offering +1s on their particular areas of expertise to build confidence in an eventual +2 -- but any sharp uptick might mean that folks could safely be bolder. Per-user stats that are off-trend for our community as a whole might be used to nudge particular users to be bolder.

Aklapper updated the task description. (Show Details)Jan 6 2017, 10:49 AM

To the owner of this session: Here is the link to the session guidelines page: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Session_Guidelines. We encourage you to recruit Note-taker(s) 2(min) and 3(max), Remote Moderator, and Advocate (optional) on the spot before the beginning of your session. Instructions about each role player's task are outlined in the guidelines. The physical version of the role cards will be made available in all the session rooms. Good luck prepping, see you at the summit! :)

Note-taker(s) of this session: Follow the instructions here: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Session_Guidelines#NOTE-TAKER.28S.29 After the session, DO NOT FORGET to copy the relevant notes and summary into a new wiki page following the template here: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Your_Session and also link this from the All Session Notes page: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/All_Session_Notes. The EtherPad links are also now linked from the Schedule page (https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Schedule) for you!

Aklapper updated the task description. (Show Details)Jan 10 2017, 6:42 PM
Aklapper updated the task description. (Show Details)Jan 17 2017, 3:20 PM
greg added a subscriber: greg.Jan 20 2017, 7:20 PM

@Aklapper do you have your slides handy?

greg added a comment.Jan 20 2017, 7:20 PM

@Aklapper do you have your slides handy?

bah, there it is in the description...

greg added a comment.Jan 20 2017, 7:49 PM

Summary wiki page with follow-up actions: https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Better_code_review_processes_for_reviewers

Notably:

  • Set up a working group (which might form a committee)
    • purpose: problem definition / scoping.
    • timeframe: 2-3 months? present finding and recommendations for next steps (probably something like "form a committee to work on the problem")

What I don't have is... who. :)

@Qgil seemed to have ideas about the working group, so I assume either he or @Aklapper will take that action item. I don't have the capacity to take it on right now.

Qgil added a comment.Jan 23 2017, 1:41 PM

@Qgil seemed to have ideas about the working group

Discussion happening at T129842: Launch the Wikimedia "Code Review Special Interest Group".

greg closed this task as Resolved.Jan 23 2017, 11:48 PM

@Qgil seemed to have ideas about the working group

Discussion happening at T129842: Launch the Wikimedia "Code Review Special Interest Group".

Let's close this event specific task then now that the discussion is back on the general task. Immediate follow-ups (notes/etc) from the event are done.

Qgil awarded a token.Jan 27 2017, 3:02 PM

Could you please upload these slides to Commons? Thanks!