Page MenuHomePhabricator

DiscussionTools doesn't recognize talk threads with Parsoid HTML
Open, HighPublic


See output of . .. there are no reply links and the message at the bottom indicates that DT doesn't recognize talk page threads.

This needs to be fixed.

Event Timeline

DiscussionTools can recognize talk threads in Parsoid HTML, but the problem is that Parsoid doesn't run the ParserAfterTidy hook, so important parts of our code don't run, and there's no obvious equivalent we could use. It was suggested that we should use OutputPageParserOutput instead, but…

We used ParserAfterTidy, because we wanted the output of our processing to be cached, and this allowed us to "piggy-back" on the parser cache, only causing a slight increase in disk space used by discussion pages instead of doubling it.

If we use OutputPageParserOutput, then it is no longer cached, which will slow down page load times. We don't really know the impact. From some quick tests it seems noticeable but not terrible. On this large page I see "DiscussionTools time usage: 0.375 seconds" (it's in the parser limit report). [I remembered it being slower… Not sure if my memory is going or if our code magically got faster.]

Also, if we use OutputPageParserOutput, then the processing might no longer be applied in contexts that don't use OutputPage, like the API. We will need to test this carefully, since we load the page content from the API after saving. We had several bugs related to this in the past before we used ParserAfterTidy.

I think we'd prefer to continue piggy-backing on the parser cache, and if we need to change that, it would take a bit more work than it seems.

[ Adding @daniel, @Ladsgroup, @Jgiannelos for visibility. I see Ed already added Scott. ]

This is basically a version of a bigger problem we have identified which broadly goes as follows.

Parsoid generates canonical HTML with all information that is cached (right now in ParserCache, previously in RESTBase). This HTML was originally focused on VisualEditor (and such clients) that need all the information so that it can be roundtripped back to canonical wikitext without dirty diffs.

Now, there are a number of other targets that use a modified form of this canonical HTML (read views will strip information, PCS does transforms, DT wants some hooks to run, MFE may do some other mangling, etc.). So, we can think of this as a html2html transformation.

So, the question becomes (a) what are performance SLOs for these target products (b) how expensive / performant are these html2html transformations (c) should the html2html output be itself cached?

Obviously, the two extremes for (c) are: (1) every target variant (or flavor or pick-your-favorite-noun) gets cached. (2) no target variant gets cached -- they are always generated on demand.

PCS is right now going through this as part of RESTBase deprecation. The goal there is to set SLOs for mobile apps, use those to optimize performance of PCS to see if the PCS transforms meet perf targets (and hence don't need any other caching beyond existing edge caching in Varnish), and if not, figure out what caching solution / strategy is needed. There are conditional caching strategies for ex. Only cache html variants for pages where the generation time exceeds the SLO targets (or some other such policy). If caching, do we use PC (with different keys) for all these Parsoid HTML variants? Do we use different PC instances? We obviously do not have PC capacity for storing all html2html variants right now. But, if we need to expand that capacity, that would be a conversation with SRE informed by some data collected here for different use cases.

What are your thoughts here for DiscussionTools? I think that will help inform what hooks should be used / added and where / when they run. Anything I missed here?

MSantos triaged this task as High priority.Tue, Sep 19, 3:35 PM