Page MenuHomePhabricator

action=parse should have an option to output DOM tree
Closed, ResolvedPublic

Description

action=parse should have an option to output the parse tree (AKA DOM tree, intermediate step in parsing wikitext to HTML), so third-party applications that want to convert wikitext to another format (like PDF) can be made more reliable (currently, they just re-implement the MW parser, which isn't very reliable or efficient).


Version: 1.14.x
Severity: enhancement

Details

Reference
bz15567

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 10:18 PM
bzimport set Reference to bz15567.

The parser does not generate a DOM tree.

(In reply to comment #1)

The parser does not generate a DOM tree.

The preprocessor does, though, doesn't it?

What the preprocessor produces is hardly usable for converting the markup to other formats, is it?

(In reply to comment #3)

What the preprocessor produces is hardly usable for converting the markup to
other formats, is it?

It's gotta be better than letting converters parse the wikitext themselves.

Not really... The preprocessor only handles {{...}}'s and {{{...}}}'s, and doesn't distinguish between templates, magic variables, and parser functions. The actual dom tree is a long set of nodes which make very little sense to anything but the parser. And on top of that if the HASH preprocessor is enabled instead, then you have no DOM to output.

Creating a parser inspired by WikiText which would parse a page based on rules into a tree format which could be sent as XML or JSON and then backconverted into WikiText or parsed into HTML was one of my side-projects, XWT. But an idea like that is incompatible with normal WikiText, and is impossible to do with the current parser.

Bryan.TongMinh wrote:

Later, when there is something like an intermediate DOM tree...