Page MenuHomePhabricator

[RFC] Introduce notion of DOM scopes in wikitext
Closed, DeclinedPublic

Description

Introduce the notion of DOM scopes in wikitext (https://www.mediawiki.org/wiki/User:SSastry_%28WMF%29/Notes/Wikitext#DOM_scopes). The idea is that when enforced, HTML produced by the wikitext construct will be balanced in isolation.

The first and easiest application of this would be for top level sections (T114072) where we can experiment and prototype this idea (without expliciting calling it such). We would have to understand the implications for editing, and for the parsing implementations (ease of supporting this and performance impacts). This can inform the other contexts where this can be extended to.In the longer run, this can be applied to template output, extension output, tables, image captions -- as part of the gradual move towards evolving wikitext towards a newer Wikitext 2.0 (T112996).

Doing this can help with:

  • editability: individual dom scopes can be edited independently and in isolation. This can help VE as well as other wikitext editing tools.
  • performance:you can parse and process DOM scopes somewhat in isolation -- a step towards supporting incremental parsing
  • ability to reason about the markup: you don't have to look at rest of the page to make sense of what this piece of code does (I am deliberately exaggerating this to highlight that when enforced, this property is not dependent on ability to not have wikitext markup errors).

The name and notion is up for discussion, but the idea is to come up with an understandable and enforceable concept that can be applied consistently.

Event Timeline

ssastry raised the priority of this task from to Medium.
ssastry updated the task description. (Show Details)
ssastry subscribed.
ssastry set Security to None.

Whoops, I think this is at least a partial dup of T114445: [RFC] Balanced templates, which I think I was writing at the same time you were writing this.

Whoops, I think this is at least a partial dup of T114445: [RFC] Balanced templates, which I think I was writing at the same time you were writing this.

Yes, there is overlap, but, not a duplicate. This one is concerned about scoping more generally and the other one is concerned specifically about templates and also brings in all the discussion related to it (including the Q2 goal we have about the prototype of a opt-in / opt-out solution) that is somewhat about, but not entirely only about scoping.

Works for me. Let's let T114445 be all the "controversial" questions that require broader discussion; this task can be narrowly focused on implementation.

Works for me. Let's let T114445 be all the "controversial" questions that require broader discussion; this task can be narrowly focused on implementation.

This RFC is about the general notion of DOM scopes that is showing up in different guises in the <section> tags proposal, balanced templates proposal, and potentially others in the future. Those other RFCs are grappling with backward-compability concerns in specific problem areas (sections, templates) which is somewhat orthogonal to the semantics that will result from those proposals.

This RFC is about:

  • does it make sense to generalize the scoping semantics more broadly and apply in other areas?
  • identify the implications (pros / cons) of doing so.
  • identify the implementation challenges and propose concrete implementation strategies that are applicable in all those scenarios.

A fully reversible wikitext2json and json2wikitext would be nice. Each object of wikitext (section, paragraph, magicword, transclusion, wilklink, exernallink, ref-tags, pre/nowiki/syntaxhighlight ....) could be representated as a k-v-pair where the value is a object or as array of objects. The main goal of this functionality is to make tooling of wikitext easier.
(one usecase could be: replace of misspelling in section and paragraphs but not in transclutions, reftags, external linktext, links, ...)

@Boshomi: We already have the functionality you are looking for. Parsoid's HTML can be converted back to wikitext. It can also be represented as JSON without too much trouble. See https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec for background.

Boshomi: See https://en.wikipedia.org/api/rest_v1/?doc, or more generally https://{domain}/api/rest_v1/?doc

The meeting minutes suggest the action item was for @ssastry to reword this proposal so as to make it more concrete. @ssastry have you followed up on this? Any interest in pursuing it further?

There are multiple related proposals that have evolved slightly differently -- I want to consolidate them into a single proposal.

  • If we want to restrict structure to just templates, there is the balanced templates proposal (T114445)
  • If we want to make structure more generic in wikitext, dom scopes is one way to do it.
  • If we want to formalize the notion of structure (dom scopes or balanced templates) into the notion of a type (which has additional benefits), we have typed wikitext.

The broader context for this is to support document composition from fragments. There are many different ways of achieving document composition but I think typed wikitext is my proposed pathway to that goal.

So, I think dom scopes in and of itself is an early notion / proposal. https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0 is the typed wikitext proposal that I discussed in 2017 devsummit . I think I wish to pursue the more full fledged proposal of a typing layer on top of wikitext which implements dom scoping semantics on various constructs. If necessary, we could close this one and open a new one for it, or repurpose this one for it, as appropriate. But, I can flesh out more details in the typed wikitext 2.0 proposals, as required.

I agree that the overall goal is to get to Wikitext 2.0, I just wasn't sure if you plan on tackling this particular issue as milestone in that path. Declining this RfC for now then and waiting on the Wikitext 2.0 one :)