There are 3 separate configs that influence wt->html and html->wt in Parsoid: (a) Parsoid Config (b) Wiki Config (c) Page Config. All three are tied together in a single object and usually accessed as env.conf.parsoid, env.conf.wiki, and env.page. This object is instantiated during initialization (either in parse.js, parserTests.js, or during API request handling).
This spreadsheet lists the specific properties for each of these configs. This is partially annotated at this time and will gradually improve, but is probably has sufficient information at this time to start thinking about and designing the PHP interface.
In the Parsoid/JS code,
- Parsoid config comes from a config file (either on disk or via scap or puppet) and is reused across requests
- Wiki config comes from from site API calls, and some properties are initialized directly from the response and other properties (usually regexps) are computed and is reused across requests.
- Page config is pretty much initialized from the wt->html or html->wt request (API, or command-line or other scripts) and is only valid for a single request.
In the Parsoid/PHP code, we need an equivalent interface that supports this, but which doesn't incur a lot of performance overhead. Pre-computing lots of config properties eagerly doesn't make sense in this context. So, the ParsingEnvironment interface needs to be designed appropriately. If necessary, Parsoid code that relies on precomputed properties can be fixed to use different properties. This decision will require taking a closer look at how these computed properties are used and what might be a good replacement instead.
Potential performance opt for later: Once the shape of the env.parsoid and env.wiki config objects settles down during design, they could even be "cached" somewhere and cleared out on every deploy. Or, if this is going to be an easy win, we could rely on expensive-to-initialize techniques and rely on these cached configs to minimize per-request initialization overheads.
Pointers to the JS code
- lib/config/MWParserEnvironment.js
- lib/config/WikiConfig.js
- lib/config/ParsoidConfig.js