Page MenuHomePhabricator

Allow Parsoid to be run in the browser as a standard JavaScript library
Closed, DeclinedPublic

Description

Problem
There are cases, like in T191939, where it would be beneficial to parse Wikitext without making a request to the server.

Solution
There should be a way for Parsoid to be used as a stand-alone (to convert wikitext to HTML DOM and vice-versa) library.
Any node.js specific bindings should be abstracted.

This would replace mediawiki.jqueryMsg

Event Timeline

ssastry subscribed.

This is unlikely to happen. If anything, as part of the Platform Evolution program, Parsoid will be integrated back into core, and we'll eventually drop the node.js version. Even if not, the Parsoid codebase is fairly complex and has a lot of dependencies that would be very heavyweight for running in the browser.

Not to mention the latency / throughput tradeoff of making a parse request vs the first time cost of having to download the entire parser.

Parsoid will be integrated back into core, and we'll eventually drop the node.js version.

I don't understand the rational for that.

Not to mention the latency / throughput tradeoff of making a parse request vs the first time cost of having to download the entire parser.

I mean you only have to download the parser when needed. downloading a cached, static, JavaScript file is probably way more efficient than making multiple parse requests.

Parsoid will be integrated back into core, and we'll eventually drop the node.js version.

I don't understand the rational for that.

It will eventually show up in https://www.mediawiki.org/wiki/Parsing/Notes/Moving_Parsoid_Into_Core ... but, till then, https://www.mediawiki.org/wiki/Parsing#Long-term_directions_as_of_November_2016 is the first time this goal showed up and evaluating this option is part of the annual plan for 2018-2019.

https://www.mediawiki.org/wiki/Parsing/Notes/Two_Systems_Problem is old discussion of about the two parser issue.

Not to mention the latency / throughput tradeoff of making a parse request vs the first time cost of having to download the entire parser.

I mean you only have to download the parser when needed. downloading a cached, static, JavaScript file is probably way more efficient than making multiple parse requests.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

I know of the arguments, I just don't understand them. We already rely on multiple systems and it's a move away from (micro-)services rather than towards that which seems to be the direction the industry as a whole is moving.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

This is irrelevant, if you use minification (uglify), as well as tree shaking, the bundle will be much much smaller than the original size.

I know of the arguments, I just don't understand them. We already rely on multiple systems and it's a move away from (micro-)services rather than towards that which seems to be the direction the industry as a whole is moving.

Okay, that is not a discussion we can have here since I don't want to get into another generic monolith vs microservices debate because that is a dead-end everytime we have that one. But, when I fill out that wiki page for Parsoid itself, hopefully, it will be a bit more clearer for why we are moving towards closer integration of Parsoid into core.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

This is irrelevant, if you use minification (uglify), as well as tree shaking, the bundle will be much much smaller than the original size.

It will still be a non-trivial size. In any case, fwiw, https://www.npmjs.com/package/parsoid is the npm package.

It will still be a non-trivial size.

I don't see how you could possibly know that until you try it. :)

It will still be a non-trivial size.

I don't see how you could possibly know that until you try it. :)

Please have a go at it if you wish. It is not a path we will be going down at this point because it is a dead end for us given the integration work we are going to be embarking on shortly.

Things are more insidious than just the file size limitations. To address the problem statement directly, any non-trivial parse requires fetching state from the server. So, concretely, template resolution wouldn't be possible with that limitation.

To address the problem statement directly, any non-trivial parse requires fetching state from the server. So, concretely, template resolution wouldn't be possible with that limitation.

I mean templates are public and effectively static and can be cached for a very long time. Because of the high cacheability the template can be served from the CDN rather than from origin, it can also be cached in the user's browser. Also, they can be downloaded ondemand when needed.

I think making maybe multiple, cached requests is way better the for sure making multiple uncached requests to origin.

Parsoid currently relies on the php parser for template expansion. Some of the most popular templates are implemented as invokations of lua modules.

Legoktm subscribed.
This comment was removed by Legoktm.

My last comment wasn't professional of me, so I've removed. My apologies for making it in the first place.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

To give another example from the InteractionTimeline

$ du -s node_modules/
2553200	node_modules/
$ du -s src
672	src

and then when it's bundled (with the default webpack configuration):

$ du -s main.js
1108	main.js

https://tools.wmflabs.org/interaction-timeline/scripts/main.js

However, the transfer, even with an empty cache, is a third of that size:

Screen Shot 2018-05-02 at 12.50.37 PM.png (481×1 px, 55 KB)

The script is non-blocking and takes less than a third of a second to download.

I understand the concern that the size might be too large, but I don't think there's a good way to know what the actual impact will be until it's tried. Also, as I noted before, this is without doing any sort of performance (or size) optimizations (other than loading the plugins with the default configuration).

Parsoid currently relies on the php parser for template expansion. Some of the most popular templates are implemented as invokations of lua modules.

I don't see anything special about template expansion that couldn't be moved into Parsoid and Lua modules could be executed in the browser with libraries like Moonshine.

I don't see anything special about template expansion that couldn't be moved into Parsoid

There isn't, and there's probably an implementation of the preprocessor somewhere in the git history to make use of