Maniphest T191991

Allow Parsoid to be run in the browser as a standard JavaScript library
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	dbarratt
	Apr 11 2018, 3:34 PM

Description

Problem
There are cases, like in T191939, where it would be beneficial to parse Wikitext without making a request to the server.

Solution
There should be a way for Parsoid to be used as a stand-alone (to convert wikitext to HTML DOM and vice-versa) library.
Any node.js specific bindings should be abstracted.

This would replace mediawiki.jqueryMsg

Related Objects

Mentioned In: T191939: How to deal with blocked messages on client that require advanced parsing?
Mentioned Here: T191939: How to deal with blocked messages on client that require advanced parsing?

Event Timeline

dbarratt created this task.Apr 11 2018, 3:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 11 2018, 3:34 PM

This is unlikely to happen. If anything, as part of the Platform Evolution program, Parsoid will be integrated back into core, and we'll eventually drop the node.js version. Even if not, the Parsoid codebase is fairly complex and has a lot of dependencies that would be very heavyweight for running in the browser.

Not to mention the latency / throughput tradeoff of making a parse request vs the first time cost of having to download the entire parser.

In T191991#4123593, @ssastry wrote:

Parsoid will be integrated back into core, and we'll eventually drop the node.js version.

I don't understand the rational for that.

In T191991#4123606, @Arlolra wrote:

Not to mention the latency / throughput tradeoff of making a parse request vs the first time cost of having to download the entire parser.

I mean you only have to download the parser when needed. downloading a cached, static, JavaScript file is probably way more efficient than making multiple parse requests.

In T191991#4123608, @dbarratt wrote:

In T191991#4123593, @ssastry wrote:

Parsoid will be integrated back into core, and we'll eventually drop the node.js version.

I don't understand the rational for that.

It will eventually show up in https://www.mediawiki.org/wiki/Parsing/Notes/Moving_Parsoid_Into_Core ... but, till then, https://www.mediawiki.org/wiki/Parsing#Long-term_directions_as_of_November_2016 is the first time this goal showed up and evaluating this option is part of the annual plan for 2018-2019.

https://www.mediawiki.org/wiki/Parsing/Notes/Two_Systems_Problem is old discussion of about the two parser issue.

In T191991#4123638, @dbarratt wrote:

In T191991#4123606, @Arlolra wrote:

Not to mention the latency / throughput tradeoff of making a parse request vs the first time cost of having to download the entire parser.

I mean you only have to download the parser when needed. downloading a cached, static, JavaScript file is probably way more efficient than making multiple parse requests.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

In T191991#4123696, @ssastry wrote:

It will eventually show up in https://www.mediawiki.org/wiki/Parsing/Notes/Moving_Parsoid_Into_Core ... but, till then, https://www.mediawiki.org/wiki/Parsing#Long-term_directions_as_of_November_2016 is the first time this goal showed up and evaluating this option is part of the annual plan for 2018-2019.

https://www.mediawiki.org/wiki/Parsing/Notes/Two_Systems_Problem is old discussion of about the two parser issue.

I know of the arguments, I just don't understand them. We already rely on multiple systems and it's a move away from (micro-)services rather than towards that which seems to be the direction the industry as a whole is moving.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

This is irrelevant, if you use minification (uglify), as well as tree shaking, the bundle will be much much smaller than the original size.

In T191991#4123740, @dbarratt wrote:

In T191991#4123696, @ssastry wrote:

It will eventually show up in https://www.mediawiki.org/wiki/Parsing/Notes/Moving_Parsoid_Into_Core ... but, till then, https://www.mediawiki.org/wiki/Parsing#Long-term_directions_as_of_November_2016 is the first time this goal showed up and evaluating this option is part of the annual plan for 2018-2019.

https://www.mediawiki.org/wiki/Parsing/Notes/Two_Systems_Problem is old discussion of about the two parser issue.

I know of the arguments, I just don't understand them. We already rely on multiple systems and it's a move away from (micro-)services rather than towards that which seems to be the direction the industry as a whole is moving.

Okay, that is not a discussion we can have here since I don't want to get into another generic monolith vs microservices debate because that is a dead-end everytime we have that one. But, when I fill out that wiki page for Parsoid itself, hopefully, it will be a bit more clearer for why we are moving towards closer integration of Parsoid into core.

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

This is irrelevant, if you use minification (uglify), as well as tree shaking, the bundle will be much much smaller than the original size.

It will still be a non-trivial size. In any case, fwiw, https://www.npmjs.com/package/parsoid is the npm package.

In T191991#4123799, @ssastry wrote:

It will still be a non-trivial size.

I don't see how you could possibly know that until you try it. :)

In T191991#4123806, @dbarratt wrote:

In T191991#4123799, @ssastry wrote:

It will still be a non-trivial size.

I don't see how you could possibly know that until you try it. :)

Please have a go at it if you wish. It is not a path we will be going down at this point because it is a dead end for us given the integration work we are going to be embarking on shortly.

dbarratt mentioned this in T191939: How to deal with blocked messages on client that require advanced parsing?.Apr 11 2018, 4:23 PM

Things are more insidious than just the file size limitations. To address the problem statement directly, any non-trivial parse requires fetching state from the server. So, concretely, template resolution wouldn't be possible with that limitation.

In T191991#4124982, @Arlolra wrote:

To address the problem statement directly, any non-trivial parse requires fetching state from the server. So, concretely, template resolution wouldn't be possible with that limitation.

I mean templates are public and effectively static and can be cached for a very long time. Because of the high cacheability the template can be served from the CDN rather than from origin, it can also be cached in the user's browser. Also, they can be downloaded ondemand when needed.

I think making maybe multiple, cached requests is way better the for sure making multiple uncached requests to origin.

Parsoid currently relies on the php parser for template expansion. Some of the most popular templates are implemented as invokations of lua modules.

Legoktm assigned this task to dbarratt.Apr 12 2018, 1:54 AM

Legoktm subscribed.

This comment was removed by Legoktm.

My last comment wasn't professional of me, so I've removed. My apologies for making it in the first place.

ssastry removed dbarratt as the assignee of this task.Apr 17 2018, 1:30 PM

In T191991#4123696, @ssastry wrote:

Look at the size here.

[subbu@earth:~/work/wmf/deploy] du -s node_modules/
190908 node_modules/
[subbu@earth:~/work/wmf/deploy] du -s src
7896 src

To give another example from the InteractionTimeline

$ du -s node_modules/
2553200	node_modules/
$ du -s src
672	src

and then when it's bundled (with the default webpack configuration):

$ du -s main.js
1108	main.js

https://tools.wmflabs.org/interaction-timeline/scripts/main.js

However, the transfer, even with an empty cache, is a third of that size:

Screen Shot 2018-05-02 at 12.50.37 PM.png (481×1 px, 55 KB)

The script is non-blocking and takes less than a third of a second to download.

I understand the concern that the size might be too large, but I don't think there's a good way to know what the actual impact will be until it's tried. Also, as I noted before, this is without doing any sort of performance (or size) optimizations (other than loading the plugins with the default configuration).

In T191991#4125379, @Arlolra wrote:

Parsoid currently relies on the php parser for template expansion. Some of the most popular templates are implemented as invokations of lua modules.

I don't see anything special about template expansion that couldn't be moved into Parsoid and Lua modules could be executed in the browser with libraries like Moonshine.

I don't see anything special about template expansion that couldn't be moved into Parsoid

There isn't, and there's probably an implementation of the preprocessor somewhere in the git history to make use of

	F17592574: Screen Shot 2018-05-02 at 12.50.37 PM.png
	May 2 2018, 4:56 PM

Allow Parsoid to be run in the browser as a standard JavaScript libraryClosed, DeclinedPublicActions

Description

Related Objects

Event Timeline

Allow Parsoid to be run in the browser as a standard JavaScript library
Closed, DeclinedPublic
Actions