Page MenuHomePhabricator

Determine doc grouping criteria and start a list of collections
Closed, ResolvedPublic


Also, update existing lists or integrate the info that the team started as part of KR1 in FY 2021-22.
"List" in this task title doesn't mean a spreadsheet; part of this overall objective is to also determine better mechanisms for making this type of information easy-to-understand and maintain, and useful for encouraging doc contributions and maintenance. See


Other Assignee

Event Timeline

TBurmeister changed the task status from Open to In Progress.Sep 20 2022, 8:37 PM
TBurmeister claimed this task.
TBurmeister triaged this task as Medium priority.
TBurmeister updated Other Assignee, added: KBach.

My research, conclusions / recommendations, and related artifacts from Q1 are at These are my thoughts only -- KBach and I still need to compare our findings, reconcile and disagreements, and create a shorter executive summary.

TBurmeister closed this task as Resolved.EditedOct 19 2022, 3:56 PM


  • Doc grouping criteria are many, and vary by context. Docs are often grouped together based on who maintains them, the subject(s) they cover, or (often less intentionally) by the structure of pages or other metadata they share.
  • Grouping docs together into collections is only useful in the following ways:
    • Limiting the scope of doc improvements or maintenance
    • Compensating for MediaWiki's limited support for structured information
    • Making it possible to assess the state of a set of documents rather than page-by-page


  • We should not attempt to create a list of collections.
  • Collections do not make it easier to maintain documentation, but they may help us scope our work and focus on areas that need documentation improvements the most. They can also be helpful in presenting an overview of the knowledge landscape to readers.
  • Collections have different organizing themes, but most of them are organized around a specific technology, platform, process, or team. In practice, it's hard to figure out if a collection should be organized around a specific technology (like "Jenkins"), or the system/process that technology is part of ("Continuous integration", "MediaWiki testing"), or the team that owns or maintains those technologies and processes ("Release engineering"). It's possible to define a set of high-level categories that cover the range of Wikimedia technical documentation, but the utility of doing so is dubious. The categories are so broad that their main utility would be to provide a landing page that guides readers to more specific collections. But those Collections would usually belong in more than one category.
  • Focusing on improving stewardship of the large mass of collections and their constituent docs is more useful than focusing on organizing collections into higher-level categories.
  • Individual documents can be part of more than one collection. Few, if any, collections can ever be clearly defined. We have limited mechanisms on-wiki to represent these complex one-to-many relationships between documents and collections, and collections and high-level categories. This is part of why we end up with long lists of see-also and nav boxes that bring together scattered docs.
  • Therefore: the best way to improve stewardship of "collections" is to work with teams and project owners to help them identify the docs that they should consider "in scope" for their work. User:KBach-WMF/Sandbox/PywikibotCollection is one example of this sort of sense-making, doc-gathering effort guided by tech writer expertise. The collection assessment guide that came out of this quarter's work includes some techniques for how to identify docs that are part of a collection.


  • Consider page hierarchy, category metadata, and linked pages to be the main indicators of collection membership. When it's possible to identify a collection landing page, these pages should link to code repos (and docs within them), where applicable.
  • Put docs that relate to specific technical components in the repos where the code lives. Docs that live with code should contain pointers to on-wiki collection landing pages, which provide the higher-level context for the code, along with connections to other relevant content.
  • Technical writers should engage directly with developers and subject matter experts to help identify the boundaries of their collections and provide guidance on applying useful page structure and document metadata to relate docs to each other and to other collections in useful and meaningful ways.