Page MenuHomePhabricator

Define and enforce a code ownership policy across all Wikimedia deployed code
Open, Needs TriagePublic

Description

NOTE: The purpose of this ticket is to capture a problem, as the reporter of this problem Jdlrobson does not plan on shepherding this to completion. This originally was sent to wikitech-l (subject "Invite to explore code ownership in MediaWiki core" )

What is the problem or opportunity?

The most recent developer satisfaction survey supports the claim that volunteers find it hard to get code reviews. The hypothesis is that by having clearer ownership around code, there should be clearer
accountability and it should be clearer what code lacks ownership that probably should.

Within the WMF, it's often unclear who to talk to about potential changes in different pieces of software, particularly when it comes to core code, which is maintained by multiple teams depending on the folder or code in question. The closest we have is https://www.mediawiki.org/wiki/Developers/Maintainers and https://www.mediawiki.org/wiki/Git/Reviewers however these pages are not always up to date, and SLAs if defined are rarely met.

On top of this much of the code, we deploy lacks maintenance within the WMF. For example, FlaggedRevs has been looking for maintainers for four years and it has been futile. If all code is required to have a WMF maintainer that would force important conversations and direct strategies.

A lack of ownership leads to unnecessarily long lead times on important initiatives, for example:

  • "2021-11-04 large file upload timeouts; Impact: For 9 months, editors were unable to upload large files (e.g. to Commons). . . . remained untriaged and uninvestigated for months. This may be related to the affected components having no active code steward."

Even with owners, what it means to be an owner currently is not formal, and does not come with any expectations nor any resourcing or time allowances.

Clear ownership boundaries would also give leadership a clear indication of which team needs support or staffing to be able to better support the code owned by them

What does the future look like if it's achieved?

For every deployed piece of code, it will be trivial to identify which team maintains the code which will help in directing train blockers, and teams that should be involved in projects early on.

Teams that are maintaining code that doesn't fit in with what they work on will negotiate with other teams to exchange projects.

  • When issues arise, owners will feel more empowered to make decisions when disputes arise.
  • Volunteer patches will know exactly who needs to review their code.
  • Management will be able to see the impact of code under code stewardship review on delivering products.
  • It will be trivial to identify code without owners
  • A clearer definition of what is meant by "code ownership" and maintenance and expectations around it. For example, if a volunteer developer wants to make a patch to code inside a codebase they are not an owner of, can another volunteer developer +2 it, or are they (socially) required to get approval from the owners?

What happens if we do nothing

A few developers tried to push this initiative in https://gerrit.wikimedia.org/r/c/mediawiki/core/+/745946 however the problem is more complicated than adding a few person / teams to a file. The closest we have today are:

Without this:

  • Time will be wasted in identifying teams and code reviewers
  • Projects will be derailed by not being able to identify risks and blockers early on (e.g. we're touching code that X team has prohibited changes on)
  • For code without clear ownership, different teams that disagree on solutions/approaches/direction on code that has no owner will require management intervention and costly processes to resolve such matters.

Notes

Gitlab

Given the eventual move to Gitlab it makes sense to follow the format described in
ttps://docs.gitlab.com/ee/user/project/code_owners.html

The assumption here is that @wmf-web would be a Gitlab user group.

Code ownership Gerrit plugin

If there is any interest in testing codeowner before this repository migrate to gitlab, Gerrit has a code owner plugin and is under active development at Google.

A demo given during the Gerrit summit in 2021: https://www.youtube.com/watch?v=p6kbDEipjCY

The slides are short but have a few references https://docs.google.com/presentation/d/15qfneQSY-TOrQ9sCSLfIZuR34IV7p2K9nxvTeaVBQ4w/edit#slide=id.gdd7e448507_0_0

A past presentation https://docs.google.com/presentation/d/1DupBnGr3apIx-jzxi9cHzSgkI-2c1ouGu1teQ4khSfc/edit with lot of material.

Data

I don't think anyone is doing any research on that front. We surely have all the review data in the git reposistories ( under refs/changes/*/*/meta ) we also have them represented in a Kibana dashboard https://wikimedia.biterg.io/app/kibana#/dashboard/Gerrit-Timing might be worth exploring.

Code stewardship Review

This effort would overlap with Code stewardship review effort by formalizing who maintains what.

@Ladsgroup wrote:

I think this is attacking the right problem from the wrong angle. We do have issues with maintainers and ownership but we do have a page written down in https://www.mediawiki.org/wiki/Developers/Maintainers The problem lies in the process. Same goes with a dozen other extensions that have been causing major issues including but not limited to DPL, ShortUrl (don't be confused with UrlShortener), CentralAuth, Logging in mw, ALL OF MULTIMEDIA STACK (see https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/WMPZHMXSLQJ6GONAVTFLDFFMPNJDVORS/) etc. I don't see this as a way that would help here. It would just add another source of truth breaking SSOT. This is something that management needs to work on, Code Stewardship program needs to actually decide on its backlog of cases. This wouldn't solve anything here.

Internal slack thread

Some notes that came out of a Slack thread (https://wikimedia.slack.com/archives/C01R06P8D1B/p1638994488254300):

  • https://www.mediawiki.org/wiki/Developers/Maintainers is seen as the canonical location for maintainers
  • During train deploys, if issues are encountered,https://www.mediawiki.org/wiki/Developers/Maintainers it used as a first stop, though in practice this just tells the deployer they are going to struggle to find someone to get attention for a bug
  • In the grand ideal scheme of things, Github, Gitlab, Gerrit etc settling on some common format recognized by everyone would likely set a new norm to define who owns which part of the code and be more helpful as a source of truth.
  • One problem with the maintainers list on mediawiki.org is that dissolved teams do not actively get removed, so it's often unclear if something is unowned or (marginally) owned by the successor team. Other than that it's a reasonably accurate resource, an easier to find and update than a file in version control.
  • ,https://www.mediawiki.org/wiki/Developers/Maintainers lists staff members who have left long ago.
  • Developers are using git log -S, git blame or git log --show-notes=review some/dir | grep 'Code-Review+2' | <count, sort etc> to identify possible reviews but that can sometimes be a red herring. For example https://gerrit.wikimedia.org/r/c/mediawiki/core/+/721115 suggests @Jdlrobson is a parser maintainer which is not true. This also places an undue burden on the individual developer but could be a useful tool for identifying candidates for ownership. It usually also identifies "whoever stepped up to fix the unmaintained thing last time around". We end up punishing the most responsible.
  • https://www.mediawiki.org/wiki/Developers/Maintainers doesn't really work for core, where there are multiple folders and it's not clear what part of code belongs to what component.
  • We currently allow code to run in production with no Code Steward/Owner. Should that be allowed?
  • During our annual planning process, we focus almost exclusively on "new" work rather than trying to reduce the backlog of unowned code in production. Identifying ownership might help with that to raise awareness.
  • A code ownership working group might help. We might exchange with other organizations such as GitHub, GitLab or Google. There was a Google presentation about code ownership with some slides about the functional part https://docs.google.com/presentation/d/1DupBnGr3apIx-jzxi9cHzSgkI-2c1ouGu1teQ4khSfc/edit#slide=id.gac148b3544_0_1711 .
  • Code ownership might not be a problem one can solve in isolation. One reason why this happens is that service boundaries line up roughly with team boundaries, so you're most often making contributions to "your own" codebase. In which case you're an owner or your teammates are.
  • Assigning owners to code is worth the consistency it provides even if some assignments seem arbitrary. However, it might not be the complete solution in itself.
  • Code ownership can help building better unit test and integration test coverage
  • Assigning Code Owners/Stewards needs to be an explicit action that is supported by the organization’s planning and prioritization efforts.
  • we still need to: a) have process that ensures that owners can respond in a timely fashion to requests and b) trust that there are mechanisms in place that will catch a problem if the owners don't
  • Ownership could mean multiple things - emergency responsibilites or commitment to reviewing patches. Means different things to different people
  • Pretty much anyone being allowed to approve (as we currently do) is actually not a scalable approach, even if it worked for a long time. Unclear ownership leads to very long discussions on whose angle, priorities, and use case on a task/patch is more valid
  • Legacy code exists. There's no such thing as software that can be left ownerless especially if people rely on it .
  • bunch of recent research papers (e.g. Automatically Recommending Peer Reviewers in MCR) that argue common reviewers rather than common authors are more natural "owners" and generally get reviews done quicker.

See Also (other related tasks):

Event Timeline

I'd be interested in hearing from the QTE team whose process this proposal is replacing. Also, my understanding of the TDMP is that the problem statement (what the issues are) should be introduced and agreed first before particular solutions are examined, which this is pushing for.

@Jdforrester-WMF JR did chip into the Slack thread with a quite clear stance (I've anonymized the notes here).
@Jrbranaa perhaps you could summarize the stance of the QTE so it's available here too for Phabricator users without Slack access?

IMO it would be more useful to discuss separately the various aspects of what we tend to bundle in the hazy concept of "ownership":

  • Should we assign each deployed software component to a team for at least firefighting-level support?
  • Should we assign each deployed software component to a team for bug triage? (Since the large file upload issue was brought up - the bottleneck there was IMO triage, and understanding that something highly impactful is broken, in a codebase where low-impact breakage is reported all the time.) Alternatively, can we improve triage coverage in some other way (e.g. clearly defined escalation channels for trusted community members)?
  • What is the level of support we want to offer for volunteers who contribute patches? (Currently this is officially zero. Many teams and individuals do better out of personal motivation, but certainly aren't required to.)
  • Should all deployed code be under active code quality maintenance? Or, phrased less evasively, should we undeploy the majority of our codebase, as there is no way we have the resources to bring it under active maintenance?
  • Should we replace the current policy of "every sufficiently reliable user can merge everywhere" with "everyone can only merge on their own codebase" or something like that?

I think without the protective shield of haziness, some of these would quickly turn out to be infeasible, and the rest don't have much interdependence on each other.

Might be a potential dup of T293365: Code governance policy; pretty much Code-Stewardship-Reviews again - a problem WMF has failed to solve for years.