See output of https://en.wikipedia.org/wiki/Talk:Hospet?useparsoid=1 . .. there are no reply links and the message at the bottom indicates that DT doesn't recognize talk page threads.
This needs to be fixed.
See output of https://en.wikipedia.org/wiki/Talk:Hospet?useparsoid=1 . .. there are no reply links and the message at the bottom indicates that DT doesn't recognize talk page threads.
This needs to be fixed.
DiscussionTools can recognize talk threads in Parsoid HTML, but the problem is that Parsoid doesn't run the ParserAfterTidy hook, so important parts of our code don't run, and there's no obvious equivalent we could use. It was suggested that we should use OutputPageParserOutput instead, but…
We used ParserAfterTidy, because we wanted the output of our processing to be cached, and this allowed us to "piggy-back" on the parser cache, only causing a slight increase in disk space used by discussion pages instead of doubling it.
If we use OutputPageParserOutput, then it is no longer cached, which will slow down page load times. We don't really know the impact. From some quick tests it seems noticeable but not terrible. On this large page I see "DiscussionTools time usage: 0.375 seconds" (it's in the parser limit report). [I remembered it being slower… Not sure if my memory is going or if our code magically got faster.]
Also, if we use OutputPageParserOutput, then the processing might no longer be applied in contexts that don't use OutputPage, like the API. We will need to test this carefully, since we load the page content from the API after saving. We had several bugs related to this in the past before we used ParserAfterTidy.
I think we'd prefer to continue piggy-backing on the parser cache, and if we need to change that, it would take a bit more work than it seems.
[ Adding @daniel, @Ladsgroup, @Jgiannelos for visibility. I see Ed already added Scott. ]
This is basically a version of a bigger problem we have identified which broadly goes as follows.
Parsoid generates canonical HTML with all information that is cached (right now in ParserCache, previously in RESTBase). This HTML was originally focused on VisualEditor (and such clients) that need all the information so that it can be roundtripped back to canonical wikitext without dirty diffs.
Now, there are a number of other targets that use a modified form of this canonical HTML (read views will strip information, PCS does transforms, DT wants some hooks to run, MFE may do some other mangling, etc.). So, we can think of this as a html2html transformation.
So, the question becomes (a) what are performance SLOs for these target products (b) how expensive / performant are these html2html transformations (c) should the html2html output be itself cached?
Obviously, the two extremes for (c) are: (1) every target variant (or flavor or pick-your-favorite-noun) gets cached. (2) no target variant gets cached -- they are always generated on demand.
PCS is right now going through this as part of RESTBase deprecation. The goal there is to set SLOs for mobile apps, use those to optimize performance of PCS to see if the PCS transforms meet perf targets (and hence don't need any other caching beyond existing edge caching in Varnish), and if not, figure out what caching solution / strategy is needed. There are conditional caching strategies for ex. Only cache html variants for pages where the generation time exceeds the SLO targets (or some other such policy). If caching, do we use PC (with different keys) for all these Parsoid HTML variants? Do we use different PC instances? We obviously do not have PC capacity for storing all html2html variants right now. But, if we need to expand that capacity, that would be a conversation with SRE informed by some data collected here for different use cases.
What are your thoughts here for DiscussionTools? I think that will help inform what hooks should be used / added and where / when they run. Anything I missed here?
We will work on T342352: Create ParsoidExperimentalDeprecatedPostProcessingHookDoNotUse hook for DiscussionTools testing to unblock testing.
Change 969390 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):
[mediawiki/extensions/DiscussionTools@master] WIP: Add ParserOutputPostTransformHook handler for Parsoid HTML
Change 969225 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):
[mediawiki/core@master] ParsoidParser: Record page title in ParserCache entries
Change 969225 merged by jenkins-bot:
[mediawiki/core@master] ParsoidParser: Record page title in ParserCache entries
There are three possible strategies. I am outlining them here along with links to patches. ( FYI:
So, there you go, three different approaches with patches and some discussion. I think after having explored all this, I prefer #1. It fits with the OutputTransform path we are exploring. #2 has the advantage in that it needs no special plumbing (beyond the new transform in Parsoid to be added) and implicitly handles performance concerns. #3 requires a lot more plumbing for handling performance issues, but that plumbing has the advantage that it can make it possible to cache a fully rendered page (skin, ui, and all) if there are no concerns going that route, but also one that is probably most risky.
I also prefer #1. I wish that I could figure out how to do it without moving the location of the OutputPageParserOutputHook, but I couldn't immediately see a way to do that.
Only two extensions implement a ParserOutputPostCacheTransformHandler.
WikidataPageBanner has a handler for this and OutputPageParserOutputHook and my core patch switches the order in which those hooks are invoked ... and afaict, looking at the code, it looks like it shouldn't matter.
WikibaseMediaInfo has a handler for this and BeforePageDisplayDisplayHook and afaict, looking at the code, it looks like it shouldn't matter.
But, post-deploy, this is something we should look at. WikidataPageBanner is used on wikivoyage. Not yet sure where WikibaseMediaInfo is used on wikis.
There are more: not all extensions use hook handler interfaces, so you need to search for the hook name rather than the interface name.
Not yet sure where WikibaseMediaInfo is used on wikis.
It’s https://commons.wikimedia.org/wiki/COM:SDC, so Commons (and Test Commons and Commons Beta).
Change 969390 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] Add ParserOutputPostCacheTransformHook handler for Parsoid HTML
Change 974275 had a related patch set uploaded (by Subramanya Sastry; author: Subramanya Sastry):
[mediawiki/extensions/DiscussionTools@master] Tweak CSS to deal with Parsoid's <section> tags
On itwiki which is now on wmf.5, many pages now render DiscussionTools with "?useparsoid=1" ( this, this, this ).
*But*, it doesn't render for the Main Page.
That is to be investigated - so leaving this task open for a bit.
Now that wmf.5 has rolled out to group 2, I am seeing another issue on enwiki talk pages.
For example, see . https://en.wikipedia.org/wiki/Talk%3APaul%20McAuliffe?useparsoid=1 (not the only one) clearly, the HTML is being processed since the metadata is present on top and in the TOC but the actual comment sections don't have the DT html ... this isn't a CSS issue.
To be investigated.
Let me know once it's fully deployed everywhere so I can renable parsoid for myself, I had parsoid enabled in my home wiki but had to disable it due to this bug :(
Looks good now. There will be separate tasks for any bugs found and followups needed.
Change 974275 merged by jenkins-bot:
[mediawiki/extensions/DiscussionTools@master] Tweak CSS to deal with Parsoid's <section> tags
Change 989982 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):
[mediawiki/core@master] Introduce ParserOutput:setFromParserOptions() and use for preview flag
Change 989982 merged by jenkins-bot:
[mediawiki/core@master] Introduce ParserOutput:setFromParserOptions() and use for preview flag