Page MenuHomePhabricator

Make table of contents available in data format
Closed, ResolvedPublic0 Estimated Story Points

Description

As a skin developer I would like to render the table of contents outside the parser content as part of the skin. This will support several use cases:

  • Desktop Improvement Project : The ToC concept would require the ToC to be moved out of #mw-content-text
  • Minerva/MobileFrontEnd: ToC is only accessible at the top of the article, which makes navigating long articles a pain point as there are no 'to the top' button and collapsing each section takes multiple interactions.
  • Third-party skins: Skin authors want to be able to move the ToC but the current solution is very hacky. Citizen uses a lot of CSS hacks while Tweeki clone the whole DOM of #toc. Both solutions have a hefty penalty on the rendering path when it should be avoided.

To do this I would use a skin option inside skin.json toc which can have two values "html" or "data".

"ValidSkinNames": {
               "myskin": {
                       "class": "SkinMustache",
                       "args": [
                               {
                                       "name": "myskin",
                                       "toc": "data"

When data, the skin is instructed to disable output of the table of contents in the parser output via a publically method on OutputPage or getHTML method:

https://github.com/wikimedia/mediawiki/blob/5357695270161bce1f6dbaccc96645791f17a013/includes/skins/SkinMustache.php#L170

A new template key would be added for all skins. This data would be a structure that is Mustache-compatible and supports recursive rendering to support nesting.

'data-toc' => $this->getOutput()->getParserOutput()->getTableOfContentData()

Template:

{{#data-toc}}
{{#items}}{{>Item}}{{/item}}
{{/data-toc}}
Item
<li><h{{level}}>{{headline}}</h{{level}}>
  <ul>{{#items}}{{>Item}}{{/item}}</ul>
</li>

Related

T114057: Refactor table of contents

Acceptance criteria

  • Skins can opt-out of including a table of contents in the HTML output by adding "toc": false to the skin declaration inside skin.json https://gerrit.wikimedia.org/r/c/mediawiki/core/+/735069
  • Skins are provided with raw data allowing them to render a table of contents as they desire

Event Timeline

There's a lot of table of contents tasks, maybe T114057: Refactor table of contents is the oldest one of interest in this regard. Probably should be noted in the others or they should be linked up or something.

Separately, not sure if this is the right task, but working on table of contents stuff should take into account templates like TOC limit (which relies on TemplateStyles for the hiding) and the other variety of TOC templates.

For now, this is a practical first step to allow 3rd party skins e.g Citizen to experiment with rendering their own table of contents. I think if doing this for a Wikimedia wiki deployed skin e.g. Vector we'd be a bit more thorough and we'd focus on reviewing and getting the behaviour right.

This sounds roughly consistent with the Content Transformation/Parsing team's architecture here. ToC is not included in Parsoid output, it is expected that it is constructed as a postprocessing pass over the <h1> tags, etc.

Two main issues to discuss here:

  1. How is compatibility with the legacy parser envisioned? Sure we set the __NOTOC__ flag and suppress this output from the legacy parser, but presumbly we'll want callbacks or something from the legacy parser to record the ToC information, and Parsoid should emit either the same or similar callbacks.
  2. Storage. In Parsoid's world-view, the ToC is "derived content" aka an "html2html" pass that you can generate from the (cached) Parsoid core HTML. Historically we've stored this in RestBase. Once upon a time, there was a "derived content" extension to MCR that was to store similar information. In theory, we have some ParserCache refactoring on the radar. Bottom line: where is the extracted ToC information to be stored, if not inline in the HTML?

Change 721115 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/core@master] WIP: Gives skins more flexibility over table of contents render

https://gerrit.wikimedia.org/r/721115

It seems the table of contents is already stored separately in the legacy parser, so I think the above patch could do this with a bit more work. Does that look like a good approach?

Jdlrobson moved this task from Incoming to Table of Contents on the Desktop Improvements board.

Change 733032 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Use a <meta> tag for the TOC_PLACEHOLDER

https://gerrit.wikimedia.org/r/733032

Change 735069 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Clean up loose ends from the skin table of contents patch

https://gerrit.wikimedia.org/r/735069

Change 721115 merged by jenkins-bot:

[mediawiki/core@master] Give skins more flexibility over table of contents render

https://gerrit.wikimedia.org/r/721115

Change 735087 had a related patch set uploaded (by Jdlrobson; author: Jdlrobson):

[mediawiki/core@master] Consult skin around decision to include table of contents

https://gerrit.wikimedia.org/r/735087

Chatted with C Scott about this patchset, and we think Article::getContext()->getSkin() should always Do The Right Thing so hopefully https://gerrit.wikimedia.org/r/c/mediawiki/core/+/735069 is all that's needed to wrap this task up to unblock web team.

Would like to confirm that with @Krinkle, and if that's wrong then we want to understand that problem, and document and spread that knowledge around more.

Change 735087 abandoned by Jdlrobson:

[mediawiki/core@master] Consult skin around decision to include table of contents

Reason:

See https://gerrit.wikimedia.org/r/c/mediawiki/core/+/735069

https://gerrit.wikimedia.org/r/735087

Chatted with C Scott about this patchset, and we think Article::getContext()->getSkin() should always Do The Right Thing […]
Would like to confirm that with @Krinkle […]

Use of a local and dependency-injected context object is generally safe, so this looks good to me.

The problem during the previous patch's iterations where this issue came up, was that it reached out to the global context object.

Use of global state in that way is well-known to be problematic in software development. It is, however, especially irresponsible when applied in code that has already transitioned to dependency injection, because it is impossible for yourself or indeed anyone else to reason about that. For one, it'd be a tall order to audit all direct and indirect callers to determine whether the two can be different (which is expected in software that involves service wiring and DI).

But perhaps more importantly, it leaves a landmind for future contributors (including our future selves) as it's infeasible to remember or discover during development that passing an object to a stable method is going to go "wrong" due to some deep internal code bypassing that DI object. It puts into question any function call involving an object of that type, or indeed anything from which that type can be derived. In deployed software, this would manifest itself subtly at first in the form of cache pollution, visual corruption, or security problems. This is unlikely to be noticed during local development, CI, or beta testing; unless you happen to test for a scenario where the two are known to be different (in which case you'd already know that the change can't be done, and thus wouldn't have done it in the first place). The guard for this is code review understanding and recognising it as bypassing DI.

Change 735069 merged by jenkins-bot:

[mediawiki/core@master] Set ParserOutput 'injectTOC' based on Skin options for page views

https://gerrit.wikimedia.org/r/735069

The problem during the previous patch's iterations where this issue came up, was that it reached out to the global context object.

Thanks for clarifying that @Krinkle I agree with you the global state usage is problematic.

I can QA and sign this off. Thanks @Krinkle and @cscott for pushing this through.

LGTM. I was able to replicate the prototype using the new core capabilities (see https://gerrit.wikimedia.org/r/c/mediawiki/skins/Vector/+/737504)

Change 748890 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] parser: Prepare to use a <meta> tag for the internal TOC_PLACEHOLDER

https://gerrit.wikimedia.org/r/748890

Note, while trying to use the existing data structure we realized it was not fit for purpose. T299065