Page MenuHomePhabricator

Epic: allow scribunto Lua modules to accept input as parsoid DOM instead of classic parser internal structures
Open, Needs TriagePublic

Description

Currently, Scribunto Lua modules receive their wiki text input as a string which has been partially pre processed in the old-school MediaWiki parser; for instance it may include strip markers (T133477).

Eventually it would be nice for the classic parser to be removed entirely or retooled to produce an intermediate DOM based on parsoid's model, but if Lua scripts are tied to the old model, we'll need a compat layer that reflattens them and emulates the strip marker behavior.

At that point we could perhaps let new modules opt in to receiving their input, and producing their output, through the DOM model.

This will allow for a cleaner interface that's not as fragile with respect to extension strip markers -- they'll be relatively sanely marked DOM nodes instead of weird looking substrings -- and should be more flexible in terms of allowing scripts to be used on wikis that aren't as tied to the wiki text editing model.

Most likely this would extend or supplement the frame object that provides the current wiki text-centric parser API: https://www.mediawiki.org/wiki/Extension:Scribunto/Lua_reference_manual#Frame_object

There are probably many difficulties to think about, so this is a long-term epic task.

Event Timeline

brion created this task.Apr 25 2016, 2:26 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 25 2016, 2:26 PM
Anomie added a subscriber: Anomie.Apr 25 2016, 2:52 PM

Eventually it would be nice for the classic parser to be removed entirely or retooled to produce an intermediate DOM based on parsoid's model

That seems like an epic on its own.

brion added a comment.Apr 25 2016, 3:44 PM

Yes, I'm on an epic task filing kick. ;)

cscott added a subscriber: cscott.EditedApr 25 2016, 4:19 PM

A strawman proposal for this can be found at T114454: [RFC] Visual Templates: Authoring templates with Visual Editor.

I believe it would be best to implement a more rigorous separation of code, data, and styling at the same time.

In so doing, you'd avoid exposing many of the details of the parser's representation to modules, which should help with the "generating intermediate DOM from the classic parser" part of the task.

I am not familiar with the new parser at all, so feel free to ignore this if it is irrelevant...:

In any case it MUST be distiguishable, what kind of replaced content is being there (currently it IS recognizible by the tag name within the strip marker).
In other words - it MUST be possible to find out whether the replaced content is <ref>, <nowiki>, <score>, whatever...
Yet better it would be, if both the original source of the tag and its result were available as well. (Artificial example to explain: <ref>Foo</ref> vs. <ref>{{PassValue|Foo}}</ref> - both behave same (Template:PassValue just returns whatever is given as an argument) - but as seen, the original code within <ref></ref> is different.)

and its result

This is unlikely as a general thing due to issues like T63268 and T73167.

Danny_B removed a subscriber: Parsing-Team.