Page MenuHomePhabricator

Technical explorations to support Article Guidance
Open, In Progress, HighPublic8 Estimated Story Points

Description

As part of T396029, a new workflow is proposed to provide contextual guidance for specific types of articles that communities can define and adjust. This concept introduces new aspects that require technical exploration to identify viable approaches to implement them.

Key concepts to explore are listed below:

  • Type of article. We need to consider which is the best way to represent article types in the system (Wikidata item, a Wikipedia category, Article Topic taxonomy, etc.). For each option, we want to identify pros and cons and determine which is the best approach to use in outlines.
  • Outlines Outlines can be defined as structured data. Technical explorations are needed to determine the best way to represent it (e.g., a JSON object), where to store them, and the implications for use and manipulation of such format and location.
  • Community configuration. Exploring the possibilities of the Community configuration platform may help to expose relevant configuration elements (outlines, supported types of topics, etc.) to the community for their adjustment.
  • Visual Editor integrations. On the initial intervention, users will reach Mobile Visual Editor with some basic support: initially preloaded contents and the inclusion of an edit tag when contents are published. We may want to explore the options to support these.

Proof of concept

Article Guidance Proof of concept.png (3×8 px, 657 KB)

A proof-of-concept with key steps of the workflow can be useful to validate the technical exploration. The following steps are proposed:

  1. Capturing the title. A special page where users can type the title for the new article they want to create.
  2. Identifying the topic and type of article. Based on the title provided, Wikidata items are surfaced to determine whether the user is trying to create an article on any of those. If the topic is on Wikidata, the user can select it and the system will use the information to determine the related type of article.
  3. Choosing the type of article (when topic does not exist). For topics that do not exist, a list is provided with the supported article types.
  4. Guidance message from the community. A message defined by the community for this particular type of article.
  5. Guidance contents from the community. Contents defined by the community for this particular type of article.
  6. Publish tag (optional). Contents created through this workflow should include an "article-guidance" edit tag that allows further analysis. Depending on the technical complexity and cross-team dependencies, this functionality can be considered as part of a separate ticket.

This is just a proof-of-concept. Some considerations:

  • Simplification. This represents a simplified set of steps and leaving aside many considerations (e.g., the article already existing, empty states, etc.). However, it puts together some of the key steps that involve the new concepts we want to explore how to support.
  • Community customization. Some steps make reference to community-defined contents. The intention for the proof of concept is to develop the support in a way that communities could provide such information (e.g., stored in a JSON file on a wiki). The actual information on the proof-of-concept can be just example placeholder information. The key point is to make it possible to update such information externally (i.e., not being hardcoded into the codebase).

Details

Event Timeline

Pginer-WMF renamed this task from Technical explorations to support Outlines to Technical explorations to support Article Guidance.Dec 16 2025, 1:51 PM
Pginer-WMF updated the task description. (Show Details)
SBisson changed the task status from Open to In Progress.Jan 16 2026, 8:03 PM
SBisson claimed this task.
SBisson triaged this task as High priority.
SBisson moved this task from Incoming to In-progress on the LPL Hypothesis board.
NOTE: Notes from a meeting with @ngkountas and @eamedina
  • Outlines are wikitext pages with sections and instructions. Specialized templates or parser functions may be created to alter instruction and guidance appearance and handling.
/wiki/Wikipedia:Outline/Animal
--

{{outline_for:Q12345}}

<!-- Lead section: 3-5 sentences. common name, scientific name, ...-->

{{infobox_animal}}

== Taxonomy
{{P55}}
{{P99}}

{{Paragraph:The {{P1}} is a land animal beloning to the {{P5}} family of land animals.}}

== Origin
{{P88}}

==References==
{{reflist}}

[[Category:Animal]]
[[Category:Land animal]]
  • An outline is linked to a type of article via a wikidata entity. For instance, an outline for Museum is linked to https://www.wikidata.org/wiki/Q33506. It means an article that has instance_of Museum or one subtype (instance_of National Museum) will use this outline.
  • Outlines can be hierarchical.
Artist
  _|__
 |    |
Poet Musician

An outline may exist for Artist, and a specialized version may be added for Poet, but Musician may not have it so it would default to Artist. When multiple outlines exist in the instance_of chain of an article, they do not combine, only the most specific outline is used.

  • When manually selecting the topic for a new article, topics are presented based on existing outlines on the current wiki and can be presented in a hierarchical way.
  • A hook may be used to clean up the outline instructions/comments that are left behind when publishing the new article

Thanks for sharing the initial thinking on how to approach this technically, @SBisson. Overall, it makes sense to me, and I like how existing elements (wiki pages and Wikidata items) are re-used. Some thoughts and questions to understand the implications below:

Metadata beyond contents. The proposed workflow includes steps that involve showing a community-defined message before reaching Visual Editor, check whether references are in a community-provided list of recommended/discouraged sources for the type of article, and showing community-defined alternatives when a community-defined/selected condition is met that suggest high risk of notability. Based on the proposed approach, all those should be expressed with some kind of special templates or parser functions, right? It would be good as next steps to illustrate how some of those may look to identify options (e.g., multiple tags vs. a single one for most metadata).

Performance assumptions. How fast can we assume that information for a given Wikidata item can be extracted? In particular, finding the corresponding outline given a Wikidata item, list all supported outlines, or extracting the list of sections from an outline. This will help to identify if it is feasible, for example, to show the "type of article" as the user searches for one (e.g., in a type ahead search scenario), or we need to reserve that information to less real-time situations (e.g., once an element is selected),

Transparency of the hierarchy. The hierarchy approach seems really cool to navigate the different levels of granularity of our knowledge. This means also that, given a Wikidata item, there are three possible levels of outline support: direct support (an outline exist for that particular item), broader support (a more general outline exist for a related item up in the hierarchy) and no support (no outlines in the family). When listing the supported types of articles, would we be able to determine which level of support we want to list? I can imagine that for some scenarios we may want to list the specific outlines, and for others consider all topics supported to any extent. I'm also not familiar with the depths of Wikidata, but I also wonder if they support multiple inheritance, and which may be possible implications if they do.

Thanks for sharing the initial thinking on how to approach this technically, @SBisson. Overall, it makes sense to me, and I like how existing elements (wiki pages and Wikidata items) are re-used. Some thoughts and questions to understand the implications below:

Metadata beyond contents. The proposed workflow includes steps that involve showing a community-defined message before reaching Visual Editor, check whether references are in a community-provided list of recommended/discouraged sources for the type of article, and showing community-defined alternatives when a community-defined/selected condition is met that suggest high risk of notability. Based on the proposed approach, all those should be expressed with some kind of special templates or parser functions, right? It would be good as next steps to illustrate how some of those may look to identify options (e.g., multiple tags vs. a single one for most metadata).

Those pre-editor messages can be specified in various ways. Here's some examples:

In the outline, in custom elements:

/wiki/Wikipedia:Outlines/Q33506
<noinclude>
<preedit-guidance>
  You are about to create an article about a museum. Make sure to read...
</preedit-guidance>

<inedit-guidance>
  This is the guidance that is shown in the VE "notice panel"
</inedit-guidance>
</noinclude>

<!-- beginning of the actual article outline here ->

<!-- lead section guidance -->

{{Infobox museum}}

== History

== Collections

== References

They can also be specified in distinct subpages following a naming convention:

/wiki/Wikipedia:Outlines/Q33506/Outline
<!-- lead section guidance -->

{{Infobox museum}}

== History

== Collections

== References
/wiki/Wikipedia:Outlines/Q33506/Preedit
You are about to create an article about a museum. Make sure to read...
/wiki/Wikipedia:Outlines/Q33506/Inedit
This is the guidance that is shown in the VE "notice panel"

This second option integrates a little better with preload and editintro existing features that we want to use to present information in the editor.

In all cases, I want to make sure maintaining the outlines and related artifacts is as natural as possible to experienced editors and admins. We can provide software support to facilitate maintenance and avoid misconfiguration but I think it's best if the bulk of the work involves tools they are already comfortable with like wikitext, templates, magic words, etc.

Performance assumptions. How fast can we assume that information for a given Wikidata item can be extracted? In particular,

finding the corresponding outline given a Wikidata item

If the Wikidata item is a specific article and we want to find if an outline is available for this type of articles, we can query Wikidata for its ancestry and check if we have an outline locally for any of the ancestors.

Let's consider Museo Del Prado (Q160112) for example. This is the ancestors query: https://w.wiki/HZF4 It returns 206 results but it gets very generic and philosophical after the first 10 or so. It's unlikely that an outline for "quaternary sector of the economy" will help write this article.

list all supported outlines

Likely that can be done with regular MediaWiki API for listing pages with the prefix "/wiki/Wikipedia:Outlines/" with a caching layer containing the Wikidata labels in the current language for those entities.

extracting the list of sections from an outline.

When starting a new article, preload=Wikipedia:Outlines/Q160112/Outline in the URL will automatically include the content of this page as the initial content for the new article. This is already in place for VE and Wikitext editor.

This will help to identify if it is feasible, for example, to show the "type of article" as the user searches for one (e.g., in a type ahead search scenario), or we need to reserve that information to less real-time situations (e.g., once an element is selected),

Transparency of the hierarchy. The hierarchy approach seems really cool to navigate the different levels of granularity of our knowledge. This means also that, given a Wikidata item, there are three possible levels of outline support: direct support (an outline exist for that particular item), broader support (a more general outline exist for a related item up in the hierarchy) and no support (no outlines in the family).

Conceptually I think an outline is at least one level more general than the articles it applies to. In other words, the outline associated with the "Museum" Wikidata item helps write "British Museum" and "Museo Del Prado" but not the Museum article itself. The Museum article could be outlined by "Cultural institution" or "Tourist attraction". Along the same lines, the "British Museum" article can be outlined by "Tourist attraction" if the more specific "Museum" or "History Museum" outlines are not currently defined.

When listing the supported types of articles, would we be able to determine which level of support we want to list? I can imagine that for some scenarios we may want to list the specific outlines, and for others consider all topics supported to any extent.

This is interesting, if I understand what you are suggesting, if we have the outline for Museum, we could claim to have some support for all subclasses like "Art museum" and "History museum".

I'm also not familiar with the depths of Wikidata, but I also wonder if they support multiple inheritance, and which may be possible implications if they do.

They do support multiple inheritance, Museo Del Prado has 5 instance_of statements: "art museum", "national museum", "organization", "tourist attraction" and "tourist destination" but looking at the entire ancestry with the WDQS query mentioned above, I see very few types candidates that can provide a reasonable full outline for the article. Probably just Museum and Art museum. However, this open the door for the concept of partial outlines where different parent items provide different sections. Just like different ancestors contribute different genetic traits. For example, still using the Prado, the Museum Outline may provide most of the sections but Organization may add sections about current management and governance and "Tourist attraction" may contribute opening hours, fees, and location. Obviously not in scope, just food for thoughts.

Change #1240008 had a related patch set uploaded (by Sbisson; author: Sbisson):

[mediawiki/extensions/ArticleGuidance@master] Article Guidance basic flow

https://gerrit.wikimedia.org/r/1240008