Global configuration information. The obvious thing here is the callbacks which control how page titles are mapped to revisions, which is certainly useful, but doesn't belong in parser options.
These callbacks are actually changed by some extensions. Perhaps we could split these out and provide non-default to the parser via ParserFactory. That approach will only work well if we actually parse right where we set these callbacks. I'll have a look at it, I have an idea.
However, one difference with ParserOptions is the dependencies. ParserOptions depend on user and optionally Language, so you can construct one pretty easily, and we have A LOT of code constructing ParserOptions directly. But this is something we need to address anyway - ParserOptions rely on global state heavily and need to have a factory anyway, so we will touch all this code.
FWIW, one reason why Parsoid doesn't use most of the stuff in ParserOptions currently is because we made a design decision *not to implement* any user-varying parser features. There's no per-user thumbnail size, stub size, etc, configuration in Parsoid. Our top-level design is that any such user customization will be done as a post-processing step.
Uploaded the slides to commons at https://commons.wikimedia.org/wiki/File:Wikimedia_Developer_Summit_2017_-_Media,_Visualizations,_and_Layout.pdf
Thu, Jul 22
Three main issues from Parsoid's perspective:
- In order to avoid circular dependencies, an interface (not necessarily TransformContext but something TransformContext implements or extends) needs to be defined in the parsoid library repository, not in core.
- Again, due to circular dependencies, that parsoid interface can't directly reference Title/Revision/User objects. Instead Title is passed through as a string, eg. The core TransformContext can have a getTitle() method for mediawiki users, but Parsoid can't use that.
- Extensions and parser functions currently use ParserOptions as a pass-through for information not actually needed by the parser, but which are needed to implement an extension tag or parser function. timestamp, speculative page id, etc etc. The TransformContext must provide a way to pipe that information through the parser and into the parser function/extension, without Parsoid necessarily knowing about it.
I did add the bikeshed warning at the top. :) I can edit the task summary with the new name as soon as we have one.
Some additional nodes:
- ParserOutput::getText() is the most ugly thing here, it doesn't really belong. It probably won't be added to ParsoidOutput. Right now ParsoidOutput is an "out" parameter to the parsing process that gets updated and stands alongside the actual DOM output of the parse. It may be worth adding a setter to ParsoidOutput to record the final output *as a DOM*, something like ParsoidOutput::setParseDocument(DOMDocument $doc). The mediawiki integration code would be responsible for taking the DOM and converting it to a string in a way which is compatible with ParserOutput::getText() and friends; hopefully Parsoid will remain naively ignorant of those details.
- Some of the setters are used by extensions or parser functions but not by the core parser. These shouldn't be added to ParsoidOutput. An extension or parser function which needs to record those details will have access to a "real" ParserOutput, they won't need those methods to be available in ParsoidOutput. Things like ParserOutput::addTrackingCategory() are an example of something which is on the bubble here. It would probably be added to ParsoidOutput because it happens to be useful to record delinting information during the parse, even though most uses of addTrackingCategory are probably in extension code.
Redirecting this task to the actual title (refactoring Parser.php): I realized over the past few days that some of the linked dependencies relating to deprecating the "clone" and "resetOutput" functions of Parser.php don't actually block the insertion of a new abstract base class in the hierarchy. The fundamental blocker was code which did "new Parser" because that would break if the Parser class became abstract. But that's been deprecated and removed for some years now. Code which clones an existing Parser will still work if you're holding on to a LegacyParser object, so is not blocking our next refactor step.
Copying some discussion from slack.
Wed, Jul 21
Brief to-do, to summarize:
- Revert the original parsoid-related patch in quibble ( https://gerrit.wikimedia.org/r/c/integration/quibble/+/705907 ) tag and release quibble 0.10.1 and rebuild images (alas)
- Add parsoid as a dependency of Flow ( https://gerrit.wikimedia.org/r/c/integration/config/+/705932 ). This may temporarily break Flow CI, but shouldn't break anyone else using extension-gate tests, since the gatedextension list ignores dependencies and so Parsoid will still be disabled for extension-gate builds. Flow devs can fix their tests if needed, and/or write the new API tests envisioned by T218534. They should probably take care that their tests will pass whether or not Parsoid is enabled, though, at least for an interim period. (ie, cleanly skip tests that require parsoid if the parsoid extension is not installed).
- Add the extension-gate tasks experimentally to Parsoid ( https://gerrit.wikimedia.org/r/c/integration/config/+/705966 ). This will allow Parsoid and Flow devs to look into these issues by running check experimental on empty commits in the parsoid repo.
- Someday: add Parsoid to the gated extension list ( https://gerrit.wikimedia.org/r/c/integration/config/+/655695 ) after Flow (and any other extensions) issues are fixed.
@Zabe @Urbanecm Thanks. Examination shows that's still the same set of Flow tests failing. So hopefully if the flow tests are fixed then those failures won't block https://gerrit.wikimedia.org/r/c/integration/config/+/655695.
I *believe* that the failure mode here is (a) the final integrated test on many repos runs core tests with a set of 'important for production' extensions that includes Flow, (b) Flow/includes/Conversion/Utils.php contains an isParsoidConfigured() method, and the tests in question used Utils::convert() which uses the legacy parser if isParsoidConfigured() returns false, and (c) the quibble patch set $wgVirtualRestConfig['modules']['parsoid'] = ; and isset( $vrs['modules']['parsoid'] ) is one of the tests used by Utils::makeVRSObject() to determine if Parsoid is configured. So isParsoidConfigured() started to return true which caused Utils::convert() to run down a different code path which triggered various failures.
Can we save the logs for one of the failed builds in GrowthExperiments as well? I assume that Flow's CI tests are the root cause here. The quibble patch effectively turned on Parsoid in all repositories at once, in addition to configuring Parsoid in a non-standard way. So if other extensions failed when Parsoid was turned on that would be interesting to the parsing team to look into, even though it shouldn't be a near term blocker.
This is also how template editing ought to work: you provide the arguments in data-mw corresponding to https://www.mediawiki.org/wiki/Specs/HTML/1.2.1#Transclusion_content with "html" properties instead of "wt", and we'll html->wt->html it and provide you with an appropriate rendered representation of the template, as a DOM fragment (with potentially multiple nodes). I take it right now you are generating wikitext for the template instance and using the wt2html API to generate the preview rendering?
Some more discussion in T114412 (which was closed as a duplicate).
Running the Parsoid service in quibble was already done; we use it to run api tests in CI. You just need to make Flow depend on the Parsoid extension.
I marked the line in https://gerrit.wikimedia.org/r/c/integration/quibble/+/703182 which I suspect to be the culprit, but fundamentally I don't think that patch should have been necessary in the first place. Parsoid already runs "as a service" when we do API testing with the core API testing framework (poking @Arlolra who worked on this), so Flow ought to be able to do the same thing without messing with the quibble configuration.
Mon, Jul 19
Fri, Jul 16
Tue, Jul 13
Mon, Jul 12
Briefly, the proposed solution from the Parsing side is that we should add a lint for "unclosed nowiki" (if there isn't already one, there may well be) and then have DiscussionTools check the lints for the given page before saving (or starting?) an edit and warn the user if there are "problematic" lints for the page.
Sun, Jul 4
Sat, Jul 3
Thu, Jul 1
Wed, Jun 30
Tue, Jun 29
Mon, Jun 28
Jun 15 2021
- Let people add numbers with CSS counter.
I was anticipating #2 actually, and I'm pretty sure that would be generally preferable on the MediaWiki side also.
We updated our deploy docs to better cover this case and ensure this doesn't happen again.