Page MenuHomePhabricator

RFC: Page components / content widgets
Open, MediumPublic

Description

We are looking for ways of improving the handling of common content structures like infoboxes, data tables, citations or navboxes. We'd like to

  • make their rendering more adaptable to different devices and use cases,
  • improve the editing experience, and
  • support the integration of different data sources.

Wikia and others have already done significant work on infoboxes (see links below). The details and considerations around those implementations are not the focus of this task, as the primary focus here is on identifying a minimal supporting infrastructure that we need to enable such implementations.

Page components

Page components are not much more than bits of well-formed HTML along with some metadata. Well-formed HTML along with attribute markers lets us cleanly swap out a component's rendering. Metadata documents each component's dependencies, so that we can propagate changes efficiently and reliably. Other metadata like page properties and render dependencies need to be aggregated when composing a larger page, so that ResourceLoader modules for example can be loaded for the entire page. Candidates for per-component metadata are:

  • ResourceLoader modules needed to render the component.
  • Resources used to render this component, for dependency tracking.
    • templates and Scribunto modules
    • images
    • wiki pages (for link rendering)
    • external data sources
  • Page metadata like
    • categories
    • external links
    • magic word flags
  • Caching / storage limitations, for dynamic content
  • Other page properties

For wiki content processed in the PHP parser, basically all of this information apart from external data sources is available in the ParserOutput object. Some of this information is already exposed in the expandtemplates end point (notably missing are the list of sub-templates used), and even more is available in the parse end point.

Currently this metadata is mostly implementation-defined, and not consistently exposed in the Action API. The proposal is to document standard metadata and its semantics, and make sure that this is consistently exposed through APIs in a way that lets clients aggregate information in a generic manner (as sets, for example), without having to know about each possible bit of metadata explicitly.

Benefits of doing this include:

  • Finer-grained dependency tracking, which in turn can make updates like refreshLinks a lot more efficient.
  • More accurate page metadata tracking in VisualEditor when inserting / removing components.
  • Efficient updating of ResourceLoader modules and other dependencies when re-rendering individual components for a different context.
  • Opening up the possibility of implementing page components in separate services.

Questions

  • Can we restrict page components to a single DOM node?
  • Can we come up with a sensible aggregation of metadata that satisfies different use cases well?
    • Idea: Two blobs (or one with two sub-objects), one for view-relevant data (modules, categories, magic word flags?), one for more verbose data like dependencies.
      • could also consider exposing modules / view metadata in HTTP header
  • Component addressing and generic parameter encoding
    • Can we generalize the process of figuring out how to render / re-render a component?

Current work and background reading

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

A step towards easier and faster matching of high-level components in page content could be to mark up elements like infoboxes or navboxes uniformly.

One idea to do this would be to leverage custom HTML5 element like <info-box> to wrap the transclusion content. Strawman syntax:

<info-box name="town" typeof="mw:Transclusion" data-mw="....">
  <div> Infobox content </div>
</info-box>

The <info-box> wrapper can be matched and styled like any other element. There is no default styling attached to it, so it should not affect the layout by default.

The tag syntax can also be matched efficiently at the string level, which makes it possible to efficiently rewrite content at the edge or in a service worker.

Implementation

To figure out the role of a component from a given template, we'd need to maintain a mapping. A possible place to maintain this could be templatedata. There might also be some room for heuristics, like categorizing all templates starting with infobox_ as an infobox.

Alternatively rendering <div data-template-name='infobox' data-params='{param:222}'></wikitemplate> would allow us more control - either use api to generate the HTML in a separate deferred request or completely rewrite the rendering engine. In mobile for instance we would like to collapse infoboxes...

Short summary of my thoughts from an irc conversation:

This goes back to the 'wikitext+templates comprise a dsl' way of looking at it. infobox / navbox are just 2 pieces of that dsl used on certain kind of pages. Editors care about what pieces of the page represent what kind of content and how they can be represented in the language they use to author it. Specific sets of templates (alongwith the names of those) represent an abstraction about content on a page and they are enforced by editors and editorial policies. There are expectations about how / where they are used, formatting, etc.

Reading/editing clients might use this information for doing special things with them.

Mobile clients care about infoboxes or navboxes because that is what they have identified as important right now .. but, if they could know about sports tables or math formulae or whatever else .. they might be intersted in those too. So, I see infoboxes and navboxes as special cases of the general problem.

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

Alternatively rendering <div data-template-name='infobox' data-params='{param:222}'></wikitemplate> would allow us more control - either use api to generate the HTML in a separate deferred request or completely rewrite the rendering engine. In mobile for instance we would like to collapse infoboxes...

Yeah, the expansion can be stripped out where it makes sense. Parsoid would likely continue to emit most / all content in expanded form, but it would be easy and efficient to provide an alternate end point that offers the content with expansions stripped, or components moved out & served separately altogether. On the client (thinking ahead to T111588 and T106099), we can set up a registry of tag names to handlers, some of which would use the parameters provided in the element to request a server-side render, while others would just render things client-side.

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

I agree that maintaining a mapping of (say) templates to components is more complex than finding some markup for it. However, I still think that defining a decent DOM representation is important for usability and performance when working with such components. When working on a DOM, we should be able to match an infobox or navbox in a single DOM selector. When working at the bytestream level for high-volume applications with latency demands, having a safe way to extract page components at the string level is very valuable, too.

Congratulations! This is one of the 52 proposals that made it through the first deadline of the Wikimedia-Developer-Summit-2016 selection process. Please pay attention to the next one: > By 6 Nov 2015, all Summit proposals must have active discussions and a Summit plan documented in the description. Proposals not reaching this critical mass can continue at their own path out of the Summit.

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

I agree that maintaining a mapping of (say) templates to components is more complex than finding some markup for it. However, I still think that defining a decent DOM representation is important for usability and performance when working with such components. When working on a DOM, we should be able to match an infobox or navbox in a single DOM selector. When working at the bytestream level for high-volume applications with latency demands, having a safe way to extract page components at the string level is very valuable, too.

To add to this, there are pages like https://en.wikipedia.org/wiki/Indian_cuisine where navboxes are positioned closer to where an infobox usually lives. There are also articles like this one that have navboxes both in the infobox position, and at the end of the article. This positioning seems sensible in a desktop context. We could make their expansion / following their custom placement optional with placeholder tags (as discussed in T105845#1650013) and serve them separately, but I think removing any trace of of their original position isn't really an option at this point.

Navcolumns are often used as a table of contents over articles, and their placement reflects that. https://en.wikipedia.org/wiki/Science is a good example - the top-level article has a navcolumn, the direct children show the navcolumn with the respective section open, and the grandchildren use topic-specific navboxes instead. It's a very helpful pattern IMO when you are searching for something and going top-down in some subject - chances are you know the top topic name but not the subtopic name, so you will need to navigate from Science to Material science much more often than in the opposite direction, or from one grandchild to other. (That last one is what navboxes are for - you have finished reading or scrolling through the article and are looking for related topics.) Navcolumns deserve their own component, distinct from naxboxes, IMO.

There are also the articles which have multiple infoboxes. E.g. w:en:Mini has 9 distinct infoboxes. I'm not sure how widespread this pattern is, but I believe vehicle and media articles often use it? (when a subtopic isn't sufficiently notable, or sufficiently detailed, to be split into a separate page/stub, but does benefit from an infobox in its section).

There's also the pattern of a "single" infobox composed of multiple parts, e.g. the modular w:en:Template:Infobox animanga as used in e.g. w:en:Mushishi. A glance through the interwiki links, shows this pattern is used in many other languages. (Second example: w:en:Template:Infobox_ship_begin as used in e.g. USS Bang.)

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

RobLa-WMF mentioned this in Unknown Object (Event).May 11 2016, 12:09 AM

The idea's quite interesting but has fallen out of discussion some time ago. Shall we remove from the TechCom RFC list, or is there a party interested in taking it back on?

Krinkle renamed this task from Page components / content widgets to RFC: Page components / content widgets.Jan 31 2018, 12:25 AM

Dropping this off the RFC board, since it's not actionable. Adding to TechCom radar, since this seems relevant to platform evolution.