Recently I've been finding the code review system frustrating and demotivating. In some cases, this has probably reduced the amount I've contributed to MediaWiki. Primarily I'm frustrated by:
- Lack of standardization over what is looked for in code review
- Long delays (measured in months) before getting any sort of review
- Seeming randomness in the code review process (One day my code is reviewed in an hour, next patch takes 3 months. Different patches seem to have different standards applied to them, in a fashion that seems totally arbitrary)
In more detail:
Sometimes it seems like what will get you a -1, is rather random, and varies a lot depending on who is reviewing your code. To use unit tests as an example, Sometimes a small change in legacy code, where it is rather difficult to add unit tests due to code structure, and which seems unlikely to have future regressions, will be -1'd unless it has unit tests to the max. Other times entirely new code which would be trivial to add unit tests to, get merged without any at all.
I would like code review to be more deterministic. I would like -1's to be something that would be easy to predict before hand, and when they happen, be more like d'oh moments, rather then feeling random. This is particularly true when combined with long delays with code review, which I'll get to next:
Sometimes it takes months (in the plural) before convincing anyone to look at your patch. This can be sped up by nagging people, but I do not want to spend all my time begging people for code review over and over. Other times it can be unclear who to add to a patch. Having code review take such a long time can be rather demotivating, as what's the point if nobody looks at your patch for six months, followed by a -1 over something silly, followed by another long waiting period. Additionally, once so much time has passed, usually I don't care about the bug the patch is trying to fix anywhere near as much anymore. If the patch is non-trivial, I've probably totally forgotten by this point what the patch was even trying to do. When someone -1's a patch a submitted a couple days ago, I'm likely to fix it. When someone -1's a patch I submitted several months ago, I'm more likely to say screw it, why did I care about this issue in the first place.
In many ways, code review epitomizes the saying that when everyone is responsible for something, nobody is responsible. (Sumana would probably say tyranny of structurelessness - http://www.jofreeman.com/joreen/tyranny.htm )
Code review tends to be fastest in very active areas of MediaWiki, or for people who work in teams (and review each other's code). Social connections can also play a key role, as often code review is done in the form of asking for favours. As such WMF employees are at a significant advantage here as usually WMF teams tend to work in one area of the code base in-force, and are almost always in teams (However, many WMF employees have complained that the CR situation is very frustrating when they have to touch code that's not directly their team's).
Expected outcome at summit
But, its easier to complain then to come up with solutions. With that said, I imagine this as primarily a brain-storming session. People have been talking about the Code review problem for ages - I believe we need radical and out of the box social changes to truly fix it.
Thus the goals of this meeting are:
- Is this just me? Do other people feel the same way? Should I just get over it? How does this issue affect different groups to different degrees?
- What are the root causes of frustrations
- And especially, try to distinguish the root causes from mere symptoms.
- Come up with plans to fix the code review system once and for all.
- Ideally for various ideas, we would suss out which root causes we expect the idea would fix
Work out a draft policy for how code review should be conducted (Like code conventions document, but for quality instead of spacing). What issues should trigger a -1. Are there things that the author should be encouraged to do.While I think this is something fairly uncontroversial that could help the situation, I would rather keep this meeting focused on brainstorming. If we do do a CR guideline, it should be separate from this meeting. Given how late in the game this is, perhaps it should be during an unconference spot.
- Agree on a final solution
- We have a number of ideas on how to fix the problem
- While we may not have decided which idea to implement, people come away from the meeting feeling that at least one of the ideas would fix things, and that there is hope for a brighter tomorrow.
- A summary of the ideas is presented to wikitech-l for futher evaluation, in the hopes that the wikitech-l discussion will narrow down the choices to the best ones.
Proposed solution to reduce delays
This is a bit radical, but I'm presenting it here as a concrete proposal to at the very least move discussion forward. And at least have something to talk about other then, yeah the status-quo is kind of sucky
The main problem is that nobody specifically is responsible for code review, so things fall through the cracks, and its hard to resuce things from the cracks, when there's nobody specifically responsible.
So lets change that.
I propose that every single patch (to MW core) has a single person responsible for it. Which person that is should be deterministic and known at all times. That person is responsible for either merging that patch, rejecting the patch outright, -1'ing the patch for something fixable, or finding somebody else to assign the patch for review (If they're not comfortable reviewing it). Of course people should be able to reject patches on the basis of this is three thousand lines without a code comment, or you've submitted 10 patches already where you tested none of them and there are fatal errors in all of them, I'm not reviewing anymore of your patches. Furthermore, the review-owner person should do one of these actions in no more then 1 week time since the patch is submitted. People who don't review things in a timely manner are removed as a reviewer.
Which person is responsible for a patch could be determined by having a list of paths in MW-core (for bonus points, also wmf-deployed extensions), where each path is owned by 1 or more people. Whichever file the patch touches the most lines of, whomever "owns" that file, is responsible for the patch (If there's multiple people, determine by doing it mod the patch number or something)
Having someone "own" a particular file doesn't mean only they can review stuff for it, anyone should still be able to review stuff, it just means that they have to review it.
Additionally, there would be a gerrit-wrangler, similar to Andre, but for gerrit, to ensure patches are getting reviewed, and nothing falls through cracks.
Cons to this plan: Might require significant resource commitment by WMF, which might be a hard sell. Difficult to enforce that volunteers must follow some deadline.
Related tickets, background information
- T101686: Goal: Reduce code review queues and waiting times (There's some good identification of problems on this one)
- T78768: Goal: Plan to prioritize code review of patches submitted by volunteers
- T101099 : Developer-Relations roadmap. dev-relations team wants to work on this issue. They have some interesting discussions about it over there
- T113378 Lessons learned from the first Gerrit Cleanup Day - Some interesting stuff here. And well clean-up day was a nice effort, imho, it is at best a band-aid solution, and really not even that, unless we start having monthly clean-up days. Its not like there is just a big back-log that needs to be cleaned and we'll be fine going forward, Code review is an on-going problem.
- http://korma.wmflabs.org/browser/gerrit_review_queue.html : Some stats
- T114320: Code-review migration to Differential status/discussion
Developer-Relations Does have some plans to try to alleviate the issue. I haven't really engaged much with them on this issue before filing this (Which WMDS page suggests I maybe should have), but writing this session proposal was kind of a spur of the moment thing I decided to, when I work up this morning and realized the deadline to write this was basically today.
Slightly edited Etherpad session notes:
- Working area (Collaboration): https://phabricator.wikimedia.org/T119030
- Topic (Make code review not suck): https://phabricator.wikimedia.org/T114419
- Related link: https://www.mediawiki.org/wiki/User:AKlapper_%28WMF%29/Code_Review
- Meeting type: Problem-solving
- What are the areas where people are frustrated. What are the root causes
- What are potential social changes we could do to reduce these frustrations
- In particular, I want to emphasize that I want to focus on social proble ms first, not technical solutions
- This meeting is meant to brainstorm root causes and solutions. We are not agreeing on a final solution
The ideal outcome is a summary of the ideas generated is presented to wikitech-l for futher evaluation, in the hopes that the wikitech-l discussion will narrow down the choices to the best ones.
- 10 minutes: session introduction
- 25 minutes: Discussion of root problems of frustration. Or if there even is frustration
- 30 minutes: Discussion of potential solutions to problems
- 15 minutes: Conclusion. How do people feel about what's been discussed. Reflections on stuff discussed. Are there solutions that specificly excite people. Next steps.
Brian introduced the session. (note: the scribe isn't responsible for the summary)
- What do other people find frustrating about code review?
- How does a reviewer find reviewable patches?
- How does a contributor find the person responsible for the code? Especially for code where no one is "responsible"? What about patches that add/change responsibilities?
- Are we too conservative about what we accept? Why do things get stalled?
- Is operations/puppet more lively than the MediaWiki codebase? (one person claimed code Review makes development un-fun)
- How do you learn how to take responsibility for sections of code? Is CR a good learning opportunity for developers?
- How do you budget time to do reviews for what you are responsible for?
- Is code review necessary? How do we catch security bugs without CR? How should we treat people who make mistakes? Does our current process ensure secure code? How do we focus review on the important stuff, rather than nitpicking? Are there restrictions that force us to have CR?
- Is a governance model necessary for CR?
- What should we do differently?
- Should we require a maintainer be identified for each patch? Should we assign patch review to specific individuals on a per patch basis? Should we ask for people to agree to be assigned maintainers? Do we need a more structured approach? What is the reward for being a maintainer? Can we make reviewers more prominent? Rewarding metrics? What if the assigned reviewer is too busy?
- how can we make non-blocking comments on a patch?
- Training sessions where main contributors explain the broader background and their plans for areas of code?
- ANSWERED: Can we have office hours for code review? Is there value in synchronous review? +2 Yes!
- What about post-commit discussion/improvement? Do github-style pull requests and feature branches work? Does Phab's model work as a middle ground between Github and Gerrit?
- Are +1s useful at all? Is it a signal for if the reviewer read/understood the code? Is a +1 useful without a comment? If someone doesn't have +2, is it still useful? Does a +1 signal risk?
- Does Gerrit embody/imply the policy we want? +2/+1/0/-1/-2 work?
- How can we invite outside experts into CR in a patch? (e.g. localization?)
- Is automerge a good idea? Does auto-merge demotivate reviewers?
- Should we migrate to Phabricator?
- Should we assign owners to areas of code?
- Can our CR process scale to early Linux level? In CR, is being a jerk better than being silent? Is our code modular enough for this?
- Have we got/do we need Code review guidelines?
- Should we have a CR committee? Does a CR committee have sufficient ROI?
- Are there good examples of WMF teams we can emulate?
- @greg: set up CR office hours (start the practice)
- @Legoktm: update the ownership list on https://www.mediawiki.org/wiki/Developers/Maintainers
- @Bawolff: follow up on questions/notes from this meeting
[bawolff] Talk about problems people have, what we can do differently. Focus on social conventions, less on technology. What do we ideally want, brianstorm ideas
[bawolff] CR is random, from time to the depth of reviews, like playing roulette. Would like it to be consistent.
[legoktm] Most frustrating is that when I have free time, it's hard to find reviewable patches. And when I need review myself, I ping people on IRC, because they don't respond to just Gerrit requests.
[cindy] For me, hard to figure out who should review the code, unsure who the one person is to review. "when everyone is responsible, no one is responsible", differing opionions on how to implement features, no one to arbitrate that
[bawolff] CR responsibility is not shared enough
(etherpad) [legoktm points to http://koti.kapsi.fi/~federico/crstats/core.txt / http://koti.kapsi.fi/~federico/crstats/extensions.txt ]
[ariel] bunch of MW code that is no longer owned because there is no MW Core team, no one is responsible (literally)
[bawolff] that's bad
[ori] I'm not a good code reviewer, ready to help with debugging in prod/handling fallout, wants to greenlight a patch even if they are good patches from trusted contributors; no socially acceptable way to take risks with patches.
[ori] Compare ori's history in ops/puppet vs mediawiki/core, way more changes in puppet even though more interested in MW. In core CR can be "a drag", people don't take chances or deviate from patterns, doesn't let you build momentum or energy. (ops/puppet allows self-merging). Be more open to risk.
- (etherpad) It's easy to size up the cost and risk that comes with modifying code, less easy to size up the cost and risk that comes with erecting barriers to contributions, letting code rot, etc. Our practices are geared toward preventing the former, but do nothing to prevent the latter.
[bawolff] summarizes as CR saps creativity
[andre_] reviewers vs. contributors. Should we have a more structured process when it comes to reviewing code: blog post by Sarah Sharp? need more skillfull/confident reviewers? reduce areas where no one is responsible. Go through rotting patches and assigning someone who is responsible? (Long version: https://www.mediawiki.org/wiki/User:AKlapper_%28WMF%29/Code_Review )
[isarra] I have +2 in mediawiki/skins/*, but don't know how to review things
[bawolff] there are no instructions for doing CR except for "don't screw up", no guidelines for what's okay or not okay. what's acceptable/minor/etc. You don't want to be the one who +2's something that wasn't supposed to be +2'd.
[isarra] If people don't have time to review themselves, then definitely don't have time to teach others
[marxarelli] Maintains mediawiki-vagrant, have event in calendar to just review code. considered doing office hours to do 1:1 reviews with people? (notes that vagrant reviews can be slow, initial provisions can take 15min). Dedicated office hours type time
[csteipp] CR is necessary. From security perspective, non-zero number of security issues caught in CR. Standardize CR, make it better. We're too hard on people when they make mistakes.
[bawolff] CR is the job that is thankless, no one gets appreciation for doing a lot of CR, compared to writing patches/fixing bugs. CR'ers take on the risk for a patch, but no reward
[adamw] CR is one of favorite parts of this job. Hallway conversations...? Actual training would be effective, people interested in certain areas of code could teach others about that, problems they see, etc.
[bawolff] we don't have anything like that
[jrobson] I enjoy code review too. Feel good factor about it, asked volunteer why they contribute to MF, its because [jon] reviews their code.
[bawolff] chances of someone sticking around reduces significantly if no review by 72 hours. Wikipedia is based on instant edits, etc.
[jrobson] repos of code that are unmaintained. do we have guidelines for what is mergable? We should have CI set up for pointing out trivial errors. A fear that as soon as you merge code it runs on the train, .. possible alternative is development branches.
[mah] need governance model, http://mwstake.org/mwstake/wiki/Blog_Post:37. quote. if you don't review people's code, it isn't being used and we're failing. Need a MW governance model.
[bawolff] don't think we need governance model for this, that's more for high level decisionmaking
[robla] we have architecture docs, kind of drafty, but we can improve those
[ori] I want to challenge chris steipp on the security angle. You said that code review has found a low but non-zero number of defects. If we had 2 reviewers, we'd find even more security risks. If we upped it to three, we'd find even more risks. On the other side, there's tons of PHP code we have that has no maintainers, which is a huge security risk, and draw off contributors
[csteipp] It is the only practical way we have of identifying security issues before releasing to production. I think the current trade-off between risks and obstacles is reasonable.
[bawolff] fully standardize what we have to check for, to make reviews actually catch security issues.
[k4-713] can't get rid of CR because of PCI (Payment Card Industry) rules. if thast happened, we'd have to stop using MW, or get someon to do reviews for us.
[bawolff] I think we have to keep CR, but we need to modify how we do it, possibly radical, more structure will help.
[janZ] difference between "moving fast/breaking things" and letting other people clean up afterwards, vs taking a risk together with a reviewer
What we should do differently
[legoktm] Brian proposed a solution on the phab task: assigning a reviewer person to every proposed patch. I think that's the key. Someone has to be in charge of the review process. big problem for maintainer-less areas.
[bawolff] Even elsewhere, patches can fall through the cracks. Have to find a person responsible for it.
[maxsem] How are you proposing to enforce this?
[bawolff] Good question. The first step is to divide up MediaWiki core into sections, and allow interested people to sign up for a section. This comes with responsibilties, and you get removed if you don't meet them. Hope enough people sign up.
[maxsem] Result will be zero signups.
[ori] No effective-way of making non-blocking comments, that get author's attention but don't block the merge. Have to give a -1, otherwise people don't see the comments.
- [dlynch] (etherpad) Differential helps with this.
[andre_] more structured review approach for reviewers. actively check who is using +1/-1, encourage them to receive +2 rights. Assign reviews that nobody selects. Need a review culture for WMF teams. Improve documentation for reviewers and contributors for what is acceptable.
- MModell (etherpad): Phabricator will solve this.
[mah] Follow up for what Max said, it sounds like a great idea but people won't do it. What is the reward for doing it for free?
[k4-713] Have we tried live review/office hours session type thing? People who wrote code/reviewed code on IRC together. [Audience: no]. Had a lot of success with live feedback.
[MatmaRex] What k4 said happens, but in an ad-hoc pattern.
[millimetric] We use cloud nine(?) get code in shape and send it to gerrit
[adamw] Merge patches if they're passable, leave comments and rely on authors goodwill to fix those issues. There is a big visibility problem with these post-merge comments, though. How does github-style feature branch development work? i.e. move forward with patch and squash later on.
[bawolff] Often times the problem is that no one looks at the patch/comments/reviews/says anything, etc.
[bd808] re: adam's q about github-style development. composer-merge-plugin is developed primarily on github, rationale was to reach normal PHP devs outside of MW community - and it worked, more contributions from outside. Key differences that are crappy about Github: no patch chains (dependent changes), cannot amend other people's patches - no collaboration
[ostriches] Phab gets it right, straddles the difference between GH/Gerrit - less strict than gerrit, but you can do deps and stuff.
[ori] Proposal: Get rid of +1's. Bugzilla quip where MZMcBride says "+1 means reviewer has working mouse" Legitimate only for controversial patches, to indicate agreement. (https://tools.wmflabs.org/bash/quip/AU7VVae36snAnmqnK_xL )
[bawolff] patch with +1 means nothing, usually means less likely to be mergable
[marxarelli] Phab improves this. Gerrit is only binary. Responsiblity of merging on reviewer.
[bawolff] I don't want to limit this to current technology options.
[mooeypoo] +1 useful for when there's a single "expert" person, whom we want to give their opinion, but who don't necessarily understand the whole codebase. e.g. moriel is a RTL expert. Comments without +1 are often not noticed.
[bawolff] siebrand +1's i18n changes, acting as "expert".
[ostriches] re ori - that's configurable, if we agree on this, it can be done. I generally agree with ori, but see mooey's point. Another problem is that +2 implies merging, we backed ourselves into a corner here. Phab gets this right by encouraging actual comments, not numbers.
- (etherpad) general etiquette: leave nitpicking comments vs. quickly patching them. Both work but as a reviewer which is expected? AW: IMO it's most polite to write a followon patch. The original author can apply the patch or not, and doesn't have to worry about merge conflicts.
[apergos] I use +1 and +2 like this: +1 is "I reviewed, haven't tested"; +2 is "Merging this, taking the responsibility". +1 makes sense in this way.
[ostirches] Unlike -2 and +2, a -1 and +1 will disappeara on new non-trivial patchset.
[legoktm] If you leave a number without commenting, you should have your rights taken away ;) Also, we need to give reviewers more recognition - e.g. put them on Special:Version, like the list of authors. The reviewer's name is also hidden in commit notes, which tools like Phab don't show by default. Brian's Phab proposal would be a good starting step. We have an initial list of "owners" in Reviewer-bot config.
- (AW in etherpad:) Reviewers can already add themselves as automatically watching the repos they can review.
- (etherpad comment from Brad): If only the reviewer bot could work on inheritance and not just filenames, I could review more extensions too... [fancy!]
[bawolff] In BZ days, we had a monthly report email listing the most active bug closers. That motivated people (or at least, me). Nemo's crstats is also nice. http://koti.kapsi.fi/~federico/crstats/core.txt
- (etherpad) number of commits vs. number of reviews
[twentyafterfour] +1 to ostriches ;) Biggest CR impediment for me is the automatic merging; I didn't know that +2 implies merge implies deploying (in some repos).
[bawolff] mediawiki-config repo is another conversation.
[Isarra] people have to +1 things because they don't have +2. (In particular volunteers.)
[bawolff] If somebody's only action on gerrit is +1'ing a patch, that's often not a good sign.
[ori] +1 means that someone likes the patch, but thought it was too risky to just merge.
- (etherpad comment): for me (Tgr) it tends to mean "willing to merge but don't want to prevent others from reviewing". A "scheduled +2" option would be nice.
- (etherpad) in repos with fewer maintainers (e.g. pywikibot) +1 means: ok for the author to self-merge
[janZ] I have used +1 with the comment "I have reviewed this, please merge" because I don't have permissions. A +1 with a comment that says what was reviewed is useful.
[mmodell] take ownership https://phabricator.wikimedia.org/owners/
Conclusion / recap
[robla] I am only putting questions in the summary (above), because otherwise could "stop" the conversation. Have to look for easy/hard questions later. If the answer is obvios, maybe we could write it down.
[twentyafterfour] Obvious to me: Move to Phabricator. Phab has Owners tool. Call to action: Set up an owners package for stuff you care about.
- (etherpad comment): Phabricator still does not know about half the repos on gerrit
- (etherpad): The "moving to differential" session was yesterday, and talked about our path to getting everything set up in Phabricator. (Yes, these sessions should probably have been reversed in order.)
[bawolff] We could've done that in SVN day and yet we didn't...
[robla] Not so obvious :)
[legoktm] Code review guidelines ... ? (missed it)
[everyone discusses 'obvious' answers, see above]
[legoktm] the Owners thing, the problem is social, not technical. List of maintainers at mw.org need updating.
[greg] We need actual guideline/policy for this.
[DanA] Look at Linux, it's reviewed on a mailing list and it works? We have smaller scale too. Maybe we have to look elsewhere.
[bawolff] I don't know much about Linux code, but I think they have components.
- (etherpad comment): Re "one jerk reviews all the code", we had that (although I don't think Brion or Tim are jerks). It didn't scale.
[bawolff] Indifference is even more sapping that bad review / people being assholes, it meant nobody even cares.
[ariel] Big prestige of being a contributor to Linux kernel, people more likely to suffer through it.
[David Lynch] Owners list is both social and technical. On Phab you get directly added as reviewers, good reason to keep up to date.
[ostriches] CR through email didn't scale for us, a looong time ago. It comes down to lack of ownership, not a technical issue. Need to remove inactive owners, etc. (cf. https://www.mediawiki.org/wiki/Gerrit/Project_ownership#Requesting_repository_ownership ?) Need a way to cease maintainership too.
[Cindy] My own patch is languishing. Need owners(?), but we also need people responsible for new code. Architectural issue, do you add stuff to an existing one or a new class?
- Reiteration of "How does a contributor find the person responsible for the code? Especially for code where no one is "responsible"?
- [legoktm] unsure, the patch has had multiple reviews. https://gerrit.wikimedia.org/r/#/c/250039/
- [cindy] multiple contradictory reviews pointing in different directions - who makes the final decision?
[bawolff] what is an acceptable way to do things? Standardize CR process, not have it dependent upon which reviewer you get. Need a catch-all owner.
- CODE Review WRANGLER
[greg-g] freaking out about no action items
[robla] most important thing is to improve the CR process, not make sure the meeting was useful. Need to prioritize list of questions.
[ori] CR has to do with rules about how we interact with software development, hard for people who are not invested with special power to change anythin - kind of meta. We should agree that CR is a severe issue that we should appoint some small committe to make binding recommendations on how to change it.
[bawolff] just to double-check, who thinks that there is a problem with CR?
<room raises hands in agreement>
[DanA] does addressing this issue have sufficient ROI?
[kaldari] Talk with teams that do a very good job at CR, and have volunteers who actually contribute
[Catrope] It comes down to ownership. VE didn't have CR backlog because there is clear responsibility.
[bawolff] Owners don't have to be WMF teams. They can be anyone.
[Catrope] Maintainership is better term than ownership
[robla] we can look into other orgs taking on ownership, i.e. use other people's code rather than reinventing the wheel and writing our own.
[bawolff] Yay I will fix everything.
[bawolff] specific code review guidelines would be interesting to explore tomorrow at unconference!