Since we have gone the way of pinning Parsoid version in core, this is no longer an issue and am closing this as declined.
@hashar https://gerrit.wikimedia.org/r/c/mediawiki/core/+/977625 should be merged and backported before rolling out the train.
Mon, Nov 27
What page on what wiki is this? Is this a content issue OR a wikitext parsing issue OR an issue with content being differently on desktop and the Android app?
Turns out this is a different manifestation of T303015#7770480 with the ParserAfterTidy hook that triggers the DiscussionTools extension's CommentFormatter which sets the JS config var in question whenever Parsoid's pipeline processes a <templatestyles> tag (or some other extension tag that Parsoid calls core to handle). This information is accumulated and propagated to the final top level parse of the page and stored in ParserCache. Now, when Parsoid tries to get DT rendered, ParserOutput finds that the wgDiscussionToolsPageThreads has been set and barfs as above.
Tue, Nov 21
Mon, Nov 20
I am going to close this and if we find any bugs / gaps, we can file new tasks.
Fri, Nov 17
Thu, Nov 16
These errors are from a visual diffing test run I kicked off about 45 mins back or so. DiscussionTools with Parsoid isn't used anywhere else right now. But, this gives me a clue for some diffs I was noticing in the test run. So, I'll fix this and hopefully that will fix the diffs too!
Now that wmf.5 has rolled out to group 2, I am seeing another issue on enwiki talk pages.
Failed test on en.wikipedia.org with wmf.4 -- page corruption / dirty diffs because template encapsulation is broken.
This is only an issue in round trip testing for non-wikitext content models, But I'll see if I can suppress the error or remove these non-wikitext content model pages from the test set.
Wed, Nov 15
We ended up reusing an existing hook and not creating a new hook for this purpose .The abandoned code is all in gerrit if we ever need a new hook again. So, declining this.
Mon, Nov 13
This is now fixed. But it uncovered an issue with the 'Review Changes' output in VE which now incorrectly shows that structured data information will be removed on save, but on save, there is no such deletion happening. We'll track and fix that separately.
Sun, Nov 12
This broke when we rolled out https://gerrit.wikimedia.org/r/c/mediawiki/core/+/953342 about 3 weeks back. Somewhere in the code paths, we forgot to account for the fact that when VE needs to edit a page, it only needs to edit the "main" slot of a page with multiple MCR slots. File pages on commons have multiple slots and with that change, we seem to be giving VE the combined HTML from non-wikitext slots.
Fri, Nov 10
I merged and pushed to beta, but TOC link fragments are broken (see https://en.wikipedia.beta.wmflabs.org/wiki/African_linsang?useparsoid=1 )
I think we should fix this sooner than later .. I suspect officewiki has subpage links.
Thu, Nov 9
An explicit wrapper could help in some scenarios but doesn't eliminate the need to know if you are in templated content before doing an operation ... you still to have to walk up to detect the wrapper and move before/after before doing that operation.
There are about 3 or 4 tables where Parosid's output is missing <td>s.
This is an oversight in Parsoid's TOC insertion code and has been there since end-March when https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/903797 got merged. The short summary is that the synthetic TOC Insertion code doesn't check if the insertion point is in the middle of a transclusion. As such, it breaks the continuity of the transclusion markup.
Wed, Nov 8
This edit name broke it for Parsoid. It moved the </onlinclude> from *outside* the transclusion to *inside* the transclusion. So, Parsoid looked at the content within <onlyinclude> ... </onlyinclude> and found an unclosed transclusion and rendered it as text.
Tue, Nov 7
I pushed that patch above for discussion.
I can see scenarios where composer install makes sense for master development .. i.e. when you want to use a new version of a package and use code available there which would fail if you tested it against vendor versions.
Mon, Nov 6
Looks like this query selector in MFE is the problem:
$containers = $xpath->query( 'body/div[@class="mw-parser-output"]' );
I thought MFE didn't use Parsoid HTML?
Fri, Nov 3
There are three possible strategies. I am outlining them here along with links to patches. ( FYI:
Thu, Nov 2
Looks like this error has gone away after the train rolled out with the Parsoid & core changes that prevents this error from triggering. So, I am going to resolve this. But, the Remex change still needs to be packaged and deployed. That will prevent future bugs like this in the future.
Wed, Nov 1
Tue, Oct 31
Oct 27 2023
Oct 26 2023
Reopening because there is one one other issue I am noticing. Check https://en.wikivoyage.org/wiki/Jenin?useparsoid=1 vs https://en.wikivoyage.org/wiki/Jenin and see the diff in the icon in the two cases. If you inspect, the issues seems to be that the content of the link is -number-around in Parsoid vs 0°0′0″N 0°0′0″E in legacy rendering.
Patches rolled out as part of the parent task handled this.
Nothing else to do here.
Looks there was nothing new to do here and we have had the ParsoidOutputAccess merge with ParserOutputAccess patch in production for more than a week now.
Oct 24 2023
We are going to sweep this under the carpet at least for this scenario as noted in T349310#9277286 .. but yes, if wikis use attribute names that trip up against this, such pages will not render. For now, we hold our breath till PHP's new DOM implementation comes online and the WMF cluster upgrades to that version (which is probably a few years away). Given that this is the first time we ran against this, I am comfortable with that approach.
Based on @cscott's hunch, I explored more and it turns out 16 pages account for all the errors seen so far. We wonder if these errors were triggered by a template edit (causing these pages to *now* go through Parsoid with the new code paths from 3 weeks back).
It is baffling .. the first error was about 12:10 UTC and https://wikitech.wikimedia.org/wiki/Server_Admin_Log#2023-10-19 shows nothing for tht timeframe. group2 rollout was a full 6 hours later and there were about ~3600 errors logged already by then.
Separately, independent of what caused this once we roll out https://gerrit.wikimedia.org/r/c/mediawiki/core/+/967275 and its parent Parsoid patch on the next train, we will no longer parse the zhwki HTML on the Parsoid side just go generate metadata -- so this will disappear then, but still curious what happened last week.
And, this is for zhwiki, so, this would have called core's zhwiki language conversion routines before passing that on to Parsoid for metadata conversion. So, the HTML is being parsed to DOM on this converted HTML. So, that is probably the source of it.
Oh, this phab task is for 1.41.0-wmf.30 not 1.42.0-wmf.1 as I originally thought ... so the error manifested even before wmf.1 was rolled out!
We have about 140 langconv (pb2pb) reqs/s but not sure how many are for zhwiki titles. But, these are about 1 every 7 second with multiple errors per request which means absolute # of error titles are smaller. But multiple errors per title is confusing me unless Remex is catching these errors, logging them, and recovering.
Oct 23 2023
The first step of this process is done. ParsoidOutputAccess is a thin wrapper over ParserOutputAccess and unblocks the parent tasks. I am going to remove this as a subtask and also remove the read views tag since the rest of the work is primarily tech debt removal.
We figured out that after all the patches and other testing, we didn't need to do this anymore.
This will ride the train this week.