RFC: Page components / content widgets
Open, MediumPublic
Actions

Assigned To

None

Authored By

	• GWicke
	Jul 15 2015, 12:36 AM

Description

We are looking for ways of improving the handling of common content structures like infoboxes, data tables, citations or navboxes. We'd like to

make their rendering more adaptable to different devices and use cases,
improve the editing experience, and
support the integration of different data sources.

Wikia and others have already done significant work on infoboxes (see links below). The details and considerations around those implementations are not the focus of this task, as the primary focus here is on identifying a minimal supporting infrastructure that we need to enable such implementations.

Page components

Page components are not much more than bits of well-formed HTML along with some metadata. Well-formed HTML along with attribute markers lets us cleanly swap out a component's rendering. Metadata documents each component's dependencies, so that we can propagate changes efficiently and reliably. Other metadata like page properties and render dependencies need to be aggregated when composing a larger page, so that ResourceLoader modules for example can be loaded for the entire page. Candidates for per-component metadata are:

ResourceLoader modules needed to render the component.
Resources used to render this component, for dependency tracking.
- templates and Scribunto modules
- images
- wiki pages (for link rendering)
- external data sources
Page metadata like
- categories
- external links
- magic word flags
Caching / storage limitations, for dynamic content
Other page properties

For wiki content processed in the PHP parser, basically all of this information apart from external data sources is available in the ParserOutput object. Some of this information is already exposed in the expandtemplates end point (notably missing are the list of sub-templates used), and even more is available in the parse end point.

Currently this metadata is mostly implementation-defined, and not consistently exposed in the Action API. The proposal is to document standard metadata and its semantics, and make sure that this is consistently exposed through APIs in a way that lets clients aggregate information in a generic manner (as sets, for example), without having to know about each possible bit of metadata explicitly.

Benefits of doing this include:

Finer-grained dependency tracking, which in turn can make updates like refreshLinks a lot more efficient.
More accurate page metadata tracking in VisualEditor when inserting / removing components.
Efficient updating of ResourceLoader modules and other dependencies when re-rendering individual components for a different context.
Opening up the possibility of implementing page components in separate services.

Questions

Can we restrict page components to a single DOM node?
Can we come up with a sensible aggregation of metadata that satisfies different use cases well?
- Idea: Two blobs (or one with two sub-objects), one for view-relevant data (modules, categories, magic word flags?), one for more verbose data like dependencies.
  - could also consider exposing modules / view metadata in HTTP header
Component addressing and generic parameter encoding
- Can we generalize the process of figuring out how to render / re-render a component?

Current work and background reading

Declarative infoboxes at Wikia: The Wikia folks are gradually replacing infobox templates with widgets, by replacing top-level infobox templates with an <infobox> tag extension wrapping an XML infobox definition. They are doing this in cooperation with the community, and provide migration tools based on heuristics on template parameters and typical values (ex: parameters whose value normally starts with Image: are rendered as images). The primary focus is on moving towards a declarative infobox widget definition, as a first step towards inline editing and flexible styling across different devices.
- Announcement thread
- Interactive edit / preview tool built by @Inez.
Wikidata-generated infoboxes by @Jdlrobson, edit interface
Capiunto: using Scribunto to render infoboxes from wikidata
https://www.mediawiki.org/wiki/Parsoid/Content_widgets: Older notes from the Parsoid team
https://www.mediawiki.org/wiki/Parsoid/DOM_notes: Parsoid notes on self-contained templates and content model constraints
T103630: Semantic content blocks
T103624: Semantic media roles
T118517: [RFC] Use <figure> for media and T118520: Use <figure-inline> instead of <span> for inline figures.
Templates are dead! Long live templates! -- presentation by @cscott at Wikimania 2015
T57524: Enforce proper nesting of most templates, and encapsulate compound content blocks
@ssastry commenting on encoding syntactical and content model constraints in templates

Related Objects
Search...

Status	Assigned	Task
Open	None	T105845 RFC: Page components / content widgets
Resolved	Arlolra	T69540 Produce/preserve the metadata about additional ResourceLoader modules required by extension tags
Resolved	• marcoil	T73490 Parsoid should set the prop parameter when calling API action=expandtemplates
Resolved	• marcoil	T86902 Improve Parsoid's loading of CSS modules using ResourceLoader

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

• GWicke updated the task description. (Show Details)Aug 19 2015, 12:34 AM

• GWicke mentioned this in T102476: RFC: Requirements for change propagation.Aug 19 2015, 12:45 AM

• GWicke updated the task description. (Show Details)Aug 19 2015, 11:46 PM

Quiddity subscribed.Aug 26 2015, 9:17 PM

• GWicke mentioned this in T111588: RFC: API-driven web front-end.Sep 5 2015, 12:09 AM

• GWicke updated the task description. (Show Details)Sep 12 2015, 12:34 AM

A step towards easier and faster matching of high-level components in page content could be to mark up elements like infoboxes or navboxes uniformly.

One idea to do this would be to leverage custom HTML5 element like <info-box> to wrap the transclusion content. Strawman syntax:

<info-box name="town" typeof="mw:Transclusion" data-mw="....">
  <div> Infobox content </div>
</info-box>

The <info-box> wrapper can be matched and styled like any other element. There is no default styling attached to it, so it should not affect the layout by default.

The tag syntax can also be matched efficiently at the string level, which makes it possible to efficiently rewrite content at the edge or in a service worker.

Implementation

To figure out the role of a component from a given template, we'd need to maintain a mapping. A possible place to maintain this could be templatedata. There might also be some room for heuristics, like categorizing all templates starting with infobox_ as an infobox.

Alternatively rendering <div data-template-name='infobox' data-params='{param:222}'></wikitemplate> would allow us more control - either use api to generate the HTML in a separate deferred request or completely rewrite the rendering engine. In mobile for instance we would like to collapse infoboxes...

Short summary of my thoughts from an irc conversation:

This goes back to the 'wikitext+templates comprise a dsl' way of looking at it. infobox / navbox are just 2 pieces of that dsl used on certain kind of pages. Editors care about what pieces of the page represent what kind of content and how they can be represented in the language they use to author it. Specific sets of templates (alongwith the names of those) represent an abstraction about content on a page and they are enforced by editors and editorial policies. There are expectations about how / where they are used, formatting, etc.

Reading/editing clients might use this information for doing special things with them.

Mobile clients care about infoboxes or navboxes because that is what they have identified as important right now .. but, if they could know about sports tables or math formulae or whatever else .. they might be intersted in those too. So, I see infoboxes and navboxes as special cases of the general problem.

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

Alternatively rendering <div data-template-name='infobox' data-params='{param:222}'></wikitemplate> would allow us more control - either use api to generate the HTML in a separate deferred request or completely rewrite the rendering engine. In mobile for instance we would like to collapse infoboxes...

Yeah, the expansion can be stripped out where it makes sense. Parsoid would likely continue to emit most / all content in expanded form, but it would be easy and efficient to provide an alternate end point that offers the content with expansions stripped, or components moved out & served separately altogether. On the client (thinking ahead to T111588 and T106099), we can set up a registry of tag names to handlers, some of which would use the parameters provided in the element to request a server-side render, while others would just render things client-side.

JanZerebecki subscribed.Sep 19 2015, 5:10 PM

• GWicke mentioned this in T55784: [EPIC] Use Parsoid HTML for all page views.Sep 28 2015, 8:54 PM

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

I agree that maintaining a mapping of (say) templates to components is more complex than finding some markup for it. However, I still think that defining a decent DOM representation is important for usability and performance when working with such components. When working on a DOM, we should be able to match an infobox or navbox in a single DOM selector. When working at the bytestream level for high-volume applications with latency demands, having a safe way to extract page components at the string level is very valuable, too.

• GWicke updated the task description. (Show Details)Sep 28 2015, 10:19 PM

• GWicke added a parent task: T111588: RFC: API-driven web front-end.Sep 29 2015, 7:48 PM

• GWicke added a project: Wikimedia-Developer-Summit-2016.Sep 30 2015, 8:59 PM

• Spage mentioned this in T91162: RFC: Shadow namespaces.Sep 30 2015, 10:39 PM

MrStradivarius subscribed.Oct 1 2015, 5:28 PM

• GWicke mentioned this in T114402: Implement something similar to the RESTBase 'section' API to provide wikitext structure information.Oct 2 2015, 12:38 AM

• GWicke mentioned this in T114542: Next Generation Content Loading and Routing, in Practice.Oct 3 2015, 7:39 PM

Congratulations! This is one of the 52 proposals that made it through the first deadline of the Wikimedia-Developer-Summit-2016 selection process. Please pay attention to the next one: > By 6 Nov 2015, all Summit proposals must have active discussions and a Summit plan documented in the description. Proposals not reaching this critical mass can continue at their own path out of the Summit.

• GWicke mentioned this in T114596: [RFC] Method for bare page retrieval (e.g. render only / no skin).Oct 3 2015, 10:21 PM

• Jhernandez awarded a token.Oct 5 2015, 11:36 AM

• Jhernandez subscribed.

In T105845#1683163, @GWicke wrote:

The primary thing to figure out is how do we get that information about infoboxes/navboxes/whatever-else form editors to parsoid/editing-clients/etc. How it is represented in (Parsoid) HTML is the simpler problem.

I agree that maintaining a mapping of (say) templates to components is more complex than finding some markup for it. However, I still think that defining a decent DOM representation is important for usability and performance when working with such components. When working on a DOM, we should be able to match an infobox or navbox in a single DOM selector. When working at the bytestream level for high-volume applications with latency demands, having a safe way to extract page components at the string level is very valuable, too.

To add to this, there are pages like https://en.wikipedia.org/wiki/Indian_cuisine where navboxes are positioned closer to where an infobox usually lives. There are also articles like this one that have navboxes both in the infobox position, and at the end of the article. This positioning seems sensible in a desktop context. We could make their expansion / following their custom placement optional with placeholder tags (as discussed in T105845#1650013) and serve them separately, but I think removing any trace of of their original position isn't really an option at this point.

Navcolumns are often used as a table of contents over articles, and their placement reflects that. https://en.wikipedia.org/wiki/Science is a good example - the top-level article has a navcolumn, the direct children show the navcolumn with the respective section open, and the grandchildren use topic-specific navboxes instead. It's a very helpful pattern IMO when you are searching for something and going top-down in some subject - chances are you know the top topic name but not the subtopic name, so you will need to navigate from Science to Material science much more often than in the opposite direction, or from one grandchild to other. (That last one is what navboxes are for - you have finished reading or scrolling through the article and are looking for related topics.) Navcolumns deserve their own component, distinct from naxboxes, IMO.

daniel added subscribers: daniel, • Jonas.Oct 5 2015, 11:27 PM

There are also the articles which have multiple infoboxes. E.g. w:en:Mini has 9 distinct infoboxes. I'm not sure how widespread this pattern is, but I believe vehicle and media articles often use it? (when a subtopic isn't sufficiently notable, or sufficiently detailed, to be split into a separate page/stub, but does benefit from an infobox in its section).

There's also the pattern of a "single" infobox composed of multiple parts, e.g. the modular w:en:Template:Infobox animanga as used in e.g. w:en:Mushishi. A glance through the interwiki links, shows this pattern is used in many other languages. (Second example: w:en:Template:Infobox_ship_begin as used in e.g. USS Bang.)

Kelson subscribed.Oct 6 2015, 8:16 PM

• GWicke mentioned this in T114788: OCG should download resourceLoader js/css dependencies.Oct 6 2015, 8:20 PM

• GWicke added a subtask: T69540: Produce/preserve the metadata about additional ResourceLoader modules required by extension tags.Oct 6 2015, 8:26 PM

• GWicke mentioned this in T113331: Provide an API flag to suppress auto-generated <references />.Oct 8 2015, 2:16 AM

Qgil moved this task from Backlog to Missing expected fields on the Wikimedia-Developer-Summit-2016 board.Oct 12 2015, 9:08 PM

Qgil moved this task from Missing expected fields to On track on the Wikimedia-Developer-Summit-2016 board.Oct 28 2015, 10:10 AM

Daniel_Mietchen subscribed.Nov 7 2015, 9:47 PM

Restricted Application added a subscriber: StudiesWorld. · View Herald TranscriptNov 7 2015, 9:47 PM

• GWicke updated the task description. (Show Details)Nov 12 2015, 11:41 PM

• GWicke mentioned this in T118548: Support following MediaWiki redirects when retrieving HTML revisions.Nov 13 2015, 6:14 AM

• RobLa-WMF mentioned this in T119029: WikiDev 16 working area: Content access and APIs.Nov 19 2015, 12:47 AM

• GWicke mentioned this in T119088: Parsing team: Q3 2015-16 goals planning dependency tracker task.Nov 19 2015, 7:03 PM

-jem- subscribed.Nov 21 2015, 9:29 PM

• RobLa-WMF moved this task from On track to On track: Content access and APIs on the Wikimedia-Developer-Summit-2016 board.Nov 24 2015, 6:34 AM

daniel mentioned this in T119593: Define the list of "must have" sessions for WikiDev '16.Nov 25 2015, 9:00 PM

• BGerstle-WMF mentioned this in T119668: As a user, I can see the picture of day in my feed.Nov 30 2015, 6:19 PM

MGChecker subscribed.Nov 30 2015, 6:39 PM

JanZerebecki added a project: Wikidata.Dec 8 2015, 1:38 PM

• RobLa-WMF mentioned this in T119022: WikiDev 16 working area: Content format.Dec 10 2015, 4:43 AM

LikeLifer subscribed.Dec 10 2015, 10:13 PM

Lydia_Pintscher moved this task from incoming to monitoring on the Wikidata board.Dec 11 2015, 10:10 AM

• GWicke mentioned this in T122390: Is RDFa metadata in Parsoid HTML head actually useful to you / no user name & edit comment suppression in Parsoid <head> metadata.Dec 24 2015, 3:32 AM

Bianjiang subscribed.Jan 4 2016, 9:17 PM

jmadler subscribed.Jan 5 2016, 12:23 AM

• GWicke mentioned this in T78676: Store & load data-mw separately.Jan 12 2016, 11:49 PM

Wikimedia Developer Summit 2016 ended two weeks ago. This task is still open. If the session in this task took place, please make sure 1) that the session Etherpad notes are linked from this task, 2) that followup tasks for any actions identified have been created and linked from this task, 3) to change the status of this task to "resolved". If this session did not take place, change the task status to "declined". If this task itself has become a well-defined action which is not finished yet, drag and drop this task into the "Work continues after Summit" column on the project workboard. Thank you for your help!

• RobLa-WMF mentioned this in T125865: Assign RFCs to ArchCom shepherds.Feb 10 2016, 8:15 PM

• RobLa-WMF moved this task from Inbox to Tracked as ArchCom-RFC on the Architecture board.Mar 17 2016, 10:32 PM

• GWicke mentioned this in T130567: WIP RFC: Hygienic transclusions for WYSIWYG, incremental parsing & composition: Options and trade-offs.Apr 7 2016, 12:51 AM

RandomDSdevel awarded a token.Apr 12 2016, 8:47 PM

RandomDSdevel subscribed.

Zppix moved this task from Unsorted to Working on on the Contributors-Team board.Apr 26 2016, 2:34 PM

• RobLa-WMF mentioned this in Unknown Object (Event).May 11 2016, 12:09 AM

• RobLa-WMF triaged this task as Medium priority.May 11 2016, 8:21 PM

Danny_B added a project: Proposal.May 22 2016, 11:25 PM

• RobLa-WMF added a project: TechCom-Has-shepherd.Jul 13 2016, 5:09 AM

• RobLa-WMF moved this task from Backlog to GWicke on the TechCom-Has-shepherd board.Jul 13 2016, 5:14 AM

• Mholloway subscribed.Dec 5 2016, 6:43 PM

• MZMcBride subscribed.Dec 18 2016, 7:44 PM

SBisson subscribed.Jan 20 2017, 3:55 PM

A somewhat related idea: T156876: Structured data side channel for wikitext

Qgil unsubscribed.Feb 7 2017, 1:16 PM

Krinkle removed a project: Architecture.Mar 31 2017, 10:15 PM

• ssastry mentioned this in T162179: Extract HTML Compatibility Layer from MCS Mobile Sections API.Apr 12 2017, 9:21 PM

Reasno subscribed.Jun 22 2017, 10:21 AM

leila subscribed.Aug 22 2017, 4:28 AM

• GWicke mentioned this in T173821: Investigate exposing content styles needed via API vs as HTML tags.Aug 31 2017, 5:56 PM

cscott mentioned this in T176242: [EPIC] Representing / extracting wiki-specific application-level semantics.Sep 19 2017, 6:08 PM

• GWicke removed • GWicke as the assignee of this task.Oct 11 2017, 10:33 PM

Krinkle removed projects: TechCom-Has-shepherd, Proposal.Dec 21 2017, 11:53 PM

The idea's quite interesting but has fallen out of discussion some time ago. Shall we remove from the TechCom RFC list, or is there a party interested in taking it back on?

Krinkle renamed this task from Page components / content widgets to RFC: Page components / content widgets.Jan 31 2018, 12:25 AM

Krinkle removed a parent task: T111588: RFC: API-driven web front-end.Jan 31 2018, 12:36 AM