Change Details

**Type of activity:** Pre-scheduled session **Main topic:** Handling wiki content beyond plaintext == The problem == Wikitext's processing model is based on generating snippets of HTML strings for wikitext markup which are then concatenated to yield a HTML string for the entire document. This string is then parsed to yield well-formed HTML markup (typically, Tidy was used to do this cleanup, but any HTML5 parser could also be used to parse to DOM and then serialize it to well-formed HTML markup.). So, in this string concatenation model, in the general case, you cannot know how a piece of wikitext markup is going to render without processing the entire docuemnt. This string-concatenation based model has a number of issues: 1. It is a poor fit for tools that operate at a structural level on the document and need to precisely map those structures back to the wikitext that generated them. VisualEditor (VE) is the best known example of such a tool. VE operates on the DOM, and the edits on the DOM need to be converted back to wikitext without introducing spurious diffs elsewhere in the document. This hard work is currently done by Parsoid which does a lot of analysis and hard work to map a DOM node to its associated wikitext string that generated it. Parsoid relies on a lot of hacks to provide this support. 2. When a template is edited, it triggers a reparse of all pages that use that template. This is a fairly expensive operation which is unavoidable since it is hard to know how the change is going to affect the rendering of the page. 3. Given a piece of markup, it is not always possible to know how that markup is going to render. For example, in this markup `'''foo {{some-template}} bar'''`, you cannot know if bar is going to be bolded or not without looking into the source of some-template. 4. Given a page, it is not possible for multiple editors to edit the same document at a subpage level finer than sections and also guarantee their edits will render the same way as they intended. 5. When an editor makes a minor edit (say a typo) on the page, the parser still reparses the entire page because, in the general case, it cannot guarantee that the minor edit will not affect rendering elsewhere on the page. However, with some mostly small changes to semantics of wikitext to move away from a string-based processing model to a DOM-baesd processing model, all of the above limitations can be addressed. It can provide improved ability to reason about wikitext, improved editing experience in visual editor, improved performance on template edits (and rerenders) and regular edits, as well as ability to potentially support sub-page editing at granularities much finer than a section (which also has the benefit of reducing edit conflicts). In this session, I am going to talk about some of these changes to wikitext semantics, and how we might get there from where we are. == Expected outcome == Increased awareness of the benefits of moving to DOM-based semantics for wikitext, as well as concrete feedback about a specific strawman proposal for getting there. == Current status of the discussion == Right now, there has been a few scattered in-person discussions with other members of the parsing team. The proposal exists in a somewhat rough draft form on mediawiki.org. But, beyond that, it hasn't had much discussion or exposure. == Links == * https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0 is the broad outline for such a proposal * https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0/Strawman_Spec is a very rough draft of a strawman implementation (Caveat: the details haven't been worked and haven't been prototyped or even thought through very deeply) which needs further exploration and elaboration.

**Type of activity:** Pre-scheduled session **Main topic:** Handling wiki content beyond plaintext == The problem == Wikitext's processing model is based on generating snippets of HTML strings for wikitext markup which are then concatenated to yield a HTML string for the entire document. This string is then parsed to yield well-formed HTML markup (typically, Tidy has been used to do this cleanup, but any HTML5 parser could also be used to parse to DOM and then serialize it to well-formed HTML markup.). So, in this string concatenation model, in the general case, you cannot know how a piece of wikitext markup is going to render without processing the entire docuemnt. This string-concatenation based model has a number of issues: 1. It is a poor fit for tools that operate at a structural level on the document and need to precisely map those structures back to the wikitext that generated them. VisualEditor (VE) is the best known example of such a tool. VE operates on the DOM, and the edits on the DOM need to be converted back to wikitext without introducing spurious diffs elsewhere in the document. To enable this, Parsoid does a lot of analysis and hard work to map a DOM node to its associated wikitext string that generated it. Parsoid relies on a lot of (some ugly) hacks to provide this support. 2. When a template is edited, it triggers a reparse of all pages that use that template. This is a fairly expensive operation which is unavoidable since it is hard to know how the change is going to affect the rendering of the page. 3. Given a piece of markup, it is not always possible to know how that markup is going to render. For example, in this markup `'''foo {{some-template}} bar'''`, you cannot know if bar is going to be bolded or not without looking into the source of some-template. 4. Given a page, it is not possible for multiple editors to edit the same document at a subpage level finer than sections and also guarantee their edits will render the same way as they intended. 5. When an editor makes a minor edit (say a typo) on the page, the parser still reparses the entire page because, in the general case, it cannot guarantee that the minor edit will not affect rendering elsewhere on the page. However, with some mostly small changes to semantics of wikitext to move away from a string-concatenation based processing model to a DOM-composition based processing model, all of the above limitations can be addressed. It can provide improved ability to reason about wikitext, improved editing experience in visual editor, improved performance on template edits (and rerenders) and regular edits, as well as ability to potentially support sub-page editing at granularities much finer than a section (which also has the benefit of reducing edit conflicts). In this session, I am going to talk about some of these changes to wikitext semantics, and how we might get there from where we are. == Expected outcome == Increased awareness of the benefits of moving to DOM-based semantics for wikitext, as well as concrete feedback about a specific strawman proposal for getting there. == Current status of the discussion == Right now, there has been a few scattered in-person discussions with other members of the parsing team. The proposal exists in a somewhat rough draft form on mediawiki.org. But, beyond that, it hasn't had much discussion or exposure. == Links == * https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0 is the broad outline for such a proposal * https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0/Strawman_Spec is a very rough draft of a strawman implementation (Caveat: the details haven't been worked and haven't been prototyped or even thought through very deeply) which needs further exploration and elaboration.

**Type of activity:** Pre-scheduled session **Main topic:** Handling wiki content beyond plaintext == The problem == Wikitext's processing model is based on generating snippets of HTML strings for wikitext markup which are then concatenated to yield a HTML string for the entire document. This string is then parsed to yield well-formed HTML markup (typically, Tidy washas been used to do this cleanup, but any HTML5 parser could also be used to parse to DOM and then serialize it to well-formed HTML markup.). So, in this string concatenation model, in the general case, you cannot know how a piece of wikitext markup is going to render without processing the entire docuemnt. This string-concatenation based model has a number of issues: 1. It is a poor fit for tools that operate at a structural level on the document and need to precisely map those structures back to the wikitext that generated them. VisualEditor (VE) is the best known example of such a tool. VE operates on the DOM, and the edits on the DOM need to be converted back to wikitext without introducing spurious diffs elsewhere in the document. This hard work is currently done byTo enable this, Parsoid which does a lot of analysis and hard work to map a DOM node to its associated wikitext string that generated it. Parsoid relies on a lot of (some ugly) hacks to provide this support. 2. When a template is edited, it triggers a reparse of all pages that use that template. This is a fairly expensive operation which is unavoidable since it is hard to know how the change is going to affect the rendering of the page. 3. Given a piece of markup, it is not always possible to know how that markup is going to render. For example, in this markup `'''foo {{some-template}} bar'''`, you cannot know if bar is going to be bolded or not without looking into the source of some-template. 4. Given a page, it is not possible for multiple editors to edit the same document at a subpage level finer than sections and also guarantee their edits will render the same way as they intended. 5. When an editor makes a minor edit (say a typo) on the page, the parser still reparses the entire page because, in the general case, it cannot guarantee that the minor edit will not affect rendering elsewhere on the page. However, with some mostly small changes to semantics of wikitext to move away from a string-concatenation based processing model to a DOM-baescomposition based processing model, all of the above limitations can be addressed. It can provide improved ability to reason about wikitext, improved editing experience in visual editor, improved performance on template edits (and rerenders) and regular edits, as well as ability to potentially support sub-page editing at granularities much finer than a section (which also has the benefit of reducing edit conflicts). In this session, I am going to talk about some of these changes to wikitext semantics, and how we might get there from where we are. == Expected outcome == Increased awareness of the benefits of moving to DOM-based semantics for wikitext, as well as concrete feedback about a specific strawman proposal for getting there. == Current status of the discussion == Right now, there has been a few scattered in-person discussions with other members of the parsing team. The proposal exists in a somewhat rough draft form on mediawiki.org. But, beyond that, it hasn't had much discussion or exposure. == Links == * https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0 is the broad outline for such a proposal * https://www.mediawiki.org/wiki/Parsing/Notes/Wikitext_2.0/Strawman_Spec is a very rough draft of a strawman implementation (Caveat: the details haven't been worked and haven't been prototyped or even thought through very deeply) which needs further exploration and elaboration.