Parsoid/JS relies on MediaWiki to:
(a) expand templates -- returns wikitext + categories|properties|modules|jsconfigvars
(b) process tag extensions (that don't have Parsoid native implementations) -- returns HTML + modules|jsconfigvars|categories
(c) get metadata for media -- returns mediatype|mime|size|url|badfile (images, audio) and mediatype|mime|size|url|badfile|derivatives|imagetext (video)
(d) link metadata for links -- returns missing|known|redirect|disambiguation for every link
(e) to get siteinfo and other config -- this is being handled by T212982: Create a ParsingEnvironment class for use with Parsoid/PHP separately and so is not a concern for this task here. This is present here only for completeness' sake.
(f) post lint information when linting is enabled.
On the Parsoid/JS end, lib/mw/ApiRequest.js and lib/mw/Batcher.js implement the functionality to talk with MediaWiki. On the MediaWiki end, the action API and the ParsoidBatchAPI extension deal with Parsoid's requests.
This task requires the implementation of a new class in MediaWiki core to subsume the functionality of code on both the Parsoid and the MediaWiki side. Parsoid/PHP (composer lib) will talk with this new class which in turn could implement the required functionality in one of multiple ways (a) natively by copying the requisite code from the ParsoidBatchAPI and core (b) proxying the requests over to one of ActionAPI or ParsoidBatchAPI whatever makes sense (c) reimplementing code or redefining interfaces (d) some combination of the three.
The overriding requirement here is to provide this information to Parsoid/PHP as efficiently as possible. At a later point in the future, we could conceivably move over some of this functionality out of MediaWiki core into the Parsoid/PHP composer lib, but that is way in the future when we are ready to pull out the legacy PHP parser. But, I am mentioning this here in case it influences the design now.
Pointers to the Parsoid JS code
- lib/mw/ApiRequest.js
- lib/mw/Batcher.js