Page MenuHomePhabricator

Spin off (Parsoid) language variants functionality as a microservice?
Open, MediumPublic


In Parsoid, language variants are implemented as a HTML -> HTML conversion that is exposed via the Parsoid REST API. This functionality requires Parsoid HTML as input, but is otherwise largely independent of core Parsoid or core MediaWiki functionality.

This will shrink the core Parsoid codebase and also shave off some code from MediaWiki core. This can also let language implementers maintain existing converters and implement new ones in the future without having to muck around with parsing code. This conversion is likely going to be a purely compute-bound activity and can also benefit from javascript performance.

Parsoid language variants code is already in a separate repository and is currently pulled into parsoid as a npm module, so that part of separating the codebases has already been done.

I am throwing this out as an idea at this time and we may not be able to make this decision before we finish our Parsoid port to PHP, but we should nevertheless engage with this as something worth doing.

Event Timeline

ssastry triaged this task as Medium priority.Jan 9 2019, 9:35 PM
Pchelolo edited projects, added Services (watching); removed Services.
ssastry renamed this task from RFC: Spin off language variants functionality as a Node.js microservice? to RFC: Spin off (Parsoid) language variants functionality as a Node.js microservice?.Jan 9 2019, 9:47 PM
ssastry updated the task description. (Show Details)

@ssastry and I have briefly discussed this last week. The way the code is currently structured, it should be possible to easily separate it (and later build upon it independently). Also, no calls to MW are needed for pure lang variant conversions at the current stage, which makes this a nice and well-isolated feature. @ssastry @cscott could you confirm the above is true?

Krinkle added a subscriber: Krinkle.

Re-tagging "RFC"-like task on the TechCom workboard as actual RFC. Moving to backlog right now. While the task does have a fairly straight objective (decide whether to make it a microservice), if I understand correctly the RFC author is not (yet) looking for feedback on this question from TechCom and/or the wider Wikimedia technical community.

Once T208524 is evaluated and considered, either start with drafting a proposal, or if you need help from us and/or would like to schedule an IRC meeting, let us know :)

As far as I know, variant conversion requires the use of conversion tables that are stored as wiki pages (prefix MediaWiki:Conversiontable/). Does the current implementation not use these? Or are they duplicated somehow?

If the new converter service would need access to these conversion tables, that would create a dependency on core MediaWiki. Such tables are cached (see $wgLanguageConverterCacheType), and the service could probably just load them from the cache, but we'd have to consider what happens in case of a cache miss. I suppose the service would make an API call to MediaWiki to acquire the conversion table (which would implicitly also put it into the cache).

It would be very nice if the parsed version of the conversion table that currently goes into the object cache would end up in a more persistent storage, with a nice cache friendly interface for use by services like the variant conversion service proposed here. This is conceptually similar to to MCR (put per page, not per revision) and to page_props (but for blobs, not values).

Closing old RFC that is not yet on to our 2020 process and does not appear to have an active owner. Feel free to re-open with our template or file a new one when that changes.

Jdforrester-WMF renamed this task from RFC: Spin off (Parsoid) language variants functionality as a Node.js microservice? to Spin off (Parsoid) language variants functionality as a microservice?.Sep 16 2020, 8:20 PM
Jdforrester-WMF reopened this task as Open.
Jdforrester-WMF removed a project: TechCom-RFC.