Page components / content widgets
Open, NormalPublic

Description

We are looking for ways of improving the handling of common content structures like infoboxes, data tables, citations or navboxes. We'd like to

  • make their rendering more adaptable to different devices and use cases,
  • improve the editing experience, and
  • support the integration of different data sources.

Wikia and others have already done significant work on infoboxes (see links below). The details and considerations around those implementations are not the focus of this task, as the primary focus here is on identifying a minimal supporting infrastructure that we need to enable such implementations.

Page components

Page components are not much more than bits of well-formed HTML along with some metadata. Well-formed HTML along with attribute markers lets us cleanly swap out a component's rendering. Metadata documents each component's dependencies, so that we can propagate changes efficiently and reliably. Other metadata like page properties and render dependencies need to be aggregated when composing a larger page, so that ResourceLoader modules for example can be loaded for the entire page. Candidates for per-component metadata are:

  • ResourceLoader modules needed to render the component.
  • Resources used to render this component, for dependency tracking.
    • templates and Scribunto modules
    • images
    • wiki pages (for link rendering)
    • external data sources
  • Page metadata like
    • categories
    • external links
    • magic word flags
  • Caching / storage limitations, for dynamic content
  • Other page properties

For wiki content processed in the PHP parser, basically all of this information apart from external data sources is available in the ParserOutput object. Some of this information is already exposed in the expandtemplates end point (notably missing are the list of sub-templates used), and even more is available in the parse end point.

Currently this metadata is mostly implementation-defined, and not consistently exposed in the Action API. The proposal is to document standard metadata and its semantics, and make sure that this is consistently exposed through APIs in a way that lets clients aggregate information in a generic manner (as sets, for example), without having to know about each possible bit of metadata explicitly.

Benefits of doing this include:

  • Finer-grained dependency tracking, which in turn can make updates like refreshLinks a lot more efficient.
  • More accurate page metadata tracking in VisualEditor when inserting / removing components.
  • Efficient updating of ResourceLoader modules and other dependencies when re-rendering individual components for a different context.
  • Opening up the possibility of implementing page components in separate services.

Questions

  • Can we restrict page components to a single DOM node?
  • Can we come up with a sensible aggregation of metadata that satisfies different use cases well?
    • Idea: Two blobs (or one with two sub-objects), one for view-relevant data (modules, categories, magic word flags?), one for more verbose data like dependencies.
      • could also consider exposing modules / view metadata in HTTP header
  • Component addressing and generic parameter encoding
    • Can we generalize the process of figuring out how to render / re-render a component?

Current work and background reading

Related Objects

There are a very large number of changes, so older changes are hidden. Show Older Changes
GWicke updated the task description. (Show Details)Aug 7 2015, 11:23 PM
GWicke updated the task description. (Show Details)Aug 7 2015, 11:30 PM
GWicke added a subscriber: ssastry.
GWicke updated the task description. (Show Details)Aug 7 2015, 11:42 PM
GWicke updated the task description. (Show Details)Aug 7 2015, 11:57 PM
GWicke updated the task description. (Show Details)
GWicke updated the task description. (Show Details)Aug 7 2015, 11:59 PM
GWicke updated the task description. (Show Details)Aug 8 2015, 12:07 AM

Created a page (technically a task, since pastes do not allow linking) to collect some use cases: T109012

Tgr added a comment.Aug 13 2015, 10:53 PM

It would be interesting to have a more detailed list of problems with the status quo, and then see which proposals might solve which problems. The ones I could gather:

  • templates do not separate business logic and presentation
  • they do not allow complex styling (e.g. media queries) and scripting (e.g. pull in ResourceLoader module X on pages where this template is used)
  • they (and also images) cannot have different presentation per skin or device (apart from global CSS rules, which do not scale)
  • they do not support true WYSIWYG editing (e.g. click on the infobox field, change the text)
  • they do not support complex interactions (e.g. edit the parameters of a chess diagram template by dragging the pieces around)
  • they can be unbalanced or otherwise impossible to represent as a DOM node (fragment?)
  • they can have side effects (language conversion flags are a particularly horrible example of this)
  • it's hard to edit the template invocation wikitext in some corner cases (e.g. parameters containing =)
  • it's hard to edit the template itself
  • it's hard to explain to the user making an edit that some parameters come from external sources (Wikidata) and changing them would affect other pages
  • it's hard to access the template parameters as structured data (doing what DBPedia does should be easy! And it's not even "template parameters", input can come from Wikidata or have a default value)
  • it's hard to localize parser cache invalidations (if the template changes, instead of reparsing the whole page, reparse the template invocation and replace the right DOM node in the article)
  • hard to reason about markup (forgot what this one was about)
  • we want to make content portable (across Wikimedia projects, maybe across all MediaWiki installations) without forcing users to copy hundreds of templates and/or template helper gadgets

Is that mostly complete?

@Tgr, I think that's a pretty exhaustive list.

Some small suggestions:

it's hard to access the template parameters as structured data

This has actually improved a lot with Parsoid providing the readily-parsed parameters in metadata.

it's hard to localize parser cache invalidations

This also affects Parsoid performance significantly, and limits our ability to embed dynamic content within a page.

The lack of template balancing also affects WYSIWYG in VE by previewing unbalanced templates in an apparently-balanced way, but then breaking unexpectedly after saving.

GWicke added a comment.EditedAug 14 2015, 1:22 AM

A recording of today's session on the topic is now available at https://www.youtube.com/watch?v=7r2hzs9lwNw.

We also have some (not very complete) notes at https://etherpad.wikimedia.org/p/Templates,_Page_Components_and_editing.

GWicke updated the task description. (Show Details)Aug 19 2015, 12:16 AM
GWicke updated the task description. (Show Details)Aug 19 2015, 12:34 AM
GWicke updated the task description. (Show Details)Aug 19 2015, 11:46 PM
GWicke updated the task description. (Show Details)Sep 12 2015, 12:34 AM
GWicke added a comment.EditedSep 17 2015, 5:33 PM

A step towards easier and faster matching of high-level components in page content could be to mark up elements like infoboxes or navboxes uniformly.

One idea to do this would be to leverage custom HTML5 element like <info-box> to wrap the transclusion content. Strawman syntax:

<info-box name="town" typeof="mw:Transclusion" data-mw="....">
  <div> Infobox content </div>
</info-box>

The <info-box> wrapper can be matched and styled like any other element. There is no default styling attached to it, so it should not affect the layout by default.

The tag syntax can also be matched efficiently at the string level, which makes it possible to efficiently rewrite content at the edge or in a service worker.

Implementation

To figure out the role of a component from a given template, we'd need to maintain a mapping. A possible place to maintain this could be templatedata. There might also be some room for heuristics, like categorizing all templates starting with infobox_ as an infobox.

Alternatively rendering <div data-template-name='infobox' data-params='{param:222}'></wikitemplate> would allow us more control - either use api to generate the HTML in a separate deferred request or completely rewrite the rendering engine. In mobile for instance we would like to collapse infoboxes...

Short summary of my thoughts from an irc conversation:

This goes back to the 'wikitext+templates comprise a dsl' way of looking at it. infobox / navbox are just 2 pieces of that dsl used on certain kind of pages. Editors care about what pieces of the page represent what kind of content and how they can be represented in the language they use to author it. Specific sets of templates (alongwith the names of those) represent an abstraction about content on a page and they are enforced by editors and editorial policies. There are expectations about how / where they are used, formatting, etc.

Reading/editing clients might use this information for doing special things with them.

Mobile clients care about infoboxes or navboxes because that is what they have identified as important right now .. but, if they could know about sports tables or math formulae or whatever else .. they might be intersted in those too. So, I see infoboxes and navboxes as special cases of the general problem.

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

GWicke added a comment.EditedSep 17 2015, 6:45 PM

Alternatively rendering <div data-template-name='infobox' data-params='{param:222}'></wikitemplate> would allow us more control - either use api to generate the HTML in a separate deferred request or completely rewrite the rendering engine. In mobile for instance we would like to collapse infoboxes...

Yeah, the expansion can be stripped out where it makes sense. Parsoid would likely continue to emit most / all content in expanded form, but it would be easy and efficient to provide an alternate end point that offers the content with expansions stripped, or components moved out & served separately altogether. On the client (thinking ahead to T111588 and T106099), we can set up a registry of tag names to handlers, some of which would use the parameters provided in the element to request a server-side render, while others would just render things client-side.

GWicke added a comment.EditedSep 28 2015, 10:12 PM

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

I agree that maintaining a mapping of (say) templates to components is more complex than finding some markup for it. However, I still think that defining a decent DOM representation is important for usability and performance when working with such components. When working on a DOM, we should be able to match an infobox or navbox in a single DOM selector. When working at the bytestream level for high-volume applications with latency demands, having a safe way to extract page components at the string level is very valuable, too.

GWicke updated the task description. (Show Details)Sep 28 2015, 10:19 PM
Qgil added a subscriber: Qgil.Oct 3 2015, 8:57 PM

Congratulations! This is one of the 52 proposals that made it through the first deadline of the Wikimedia-Developer-Summit-2016 selection process. Please pay attention to the next one: > By 6 Nov 2015, all Summit proposals must have active discussions and a Summit plan documented in the description. Proposals not reaching this critical mass can continue at their own path out of the Summit.

Jhernandez added a subscriber: Jhernandez.

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

I agree that maintaining a mapping of (say) templates to components is more complex than finding some markup for it. However, I still think that defining a decent DOM representation is important for usability and performance when working with such components. When working on a DOM, we should be able to match an infobox or navbox in a single DOM selector. When working at the bytestream level for high-volume applications with latency demands, having a safe way to extract page components at the string level is very valuable, too.

To add to this, there are pages like https://en.wikipedia.org/wiki/Indian_cuisine where navboxes are positioned closer to where an infobox usually lives. There are also articles like this one that have navboxes both in the infobox position, and at the end of the article. This positioning seems sensible in a desktop context. We could make their expansion / following their custom placement optional with placeholder tags (as discussed in T105845#1650013) and serve them separately, but I think removing any trace of of their original position isn't really an option at this point.

Tgr added a comment.Oct 5 2015, 11:24 PM

Navcolumns are often used as a table of contents over articles, and their placement reflects that. https://en.wikipedia.org/wiki/Science is a good example - the top-level article has a navcolumn, the direct children show the navcolumn with the respective section open, and the grandchildren use topic-specific navboxes instead. It's a very helpful pattern IMO when you are searching for something and going top-down in some subject - chances are you know the top topic name but not the subtopic name, so you will need to navigate from Science to Material science much more often than in the opposite direction, or from one grandchild to other. (That last one is what navboxes are for - you have finished reading or scrolling through the article and are looking for related topics.) Navcolumns deserve their own component, distinct from naxboxes, IMO.

There are also the articles which have multiple infoboxes. E.g. w:en:Mini has 9 distinct infoboxes. I'm not sure how widespread this pattern is, but I believe vehicle and media articles often use it? (when a subtopic isn't sufficiently notable, or sufficiently detailed, to be split into a separate page/stub, but does benefit from an infobox in its section).

There's also the pattern of a "single" infobox composed of multiple parts, e.g. the modular w:en:Template:Infobox animanga as used in e.g. w:en:Mushishi. A glance through the interwiki links, shows this pattern is used in many other languages. (Second example: w:en:Template:Infobox_ship_begin as used in e.g. USS Bang.)

Kelson added a subscriber: Kelson.Oct 6 2015, 8:16 PM
Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 7 2015, 9:47 PM
GWicke updated the task description. (Show Details)Nov 12 2015, 11:41 PM
-jem- added a subscriber: -jem-.Nov 21 2015, 9:29 PM

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

Zppix moved this task from Unsorted to Working on on the Contributors-Team board.Apr 26 2016, 2:34 PM
RobLa-WMF mentioned this in Unknown Object (Event).May 11 2016, 12:09 AM
RobLa-WMF triaged this task as Normal priority.May 11 2016, 8:21 PM
Qgil removed a subscriber: Qgil.Feb 7 2017, 1:16 PM
Reasno added a subscriber: Reasno.Jun 22 2017, 10:21 AM
leila added a subscriber: leila.Aug 22 2017, 4:28 AM
GWicke removed GWicke as the assignee of this task.Oct 11 2017, 10:33 PM