Page MenuHomePhabricator

Define ParserCache strategy for pages with content placeholder slots (as with wikifunctions)
Open, Needs TriagePublic

Description

Fragment generators like wikifunctions might not be able to guarantee that they generate their final results within the constraint of a request timeout. So, in the generic case, Parsoid might insert placeholder content for such fragments OR the fragment generator itself might return a placeholder fragment with a cache TTL value (given its knowledge of how long it expects the computation might take).

For ParserOutput objects that contain such page results, we should determine the caching strategy when the page has a known placeholder slot. There are three possibilities:

  • Don't cache these pages. If we go this route, we need to work through the performance implications on popular pages, especially around breaking event scenarios. We need to ensure PoolCounters can handle these scenarios.
  • Cache these pages with a short generic TTL: We need to determine the TTL value and work through any other followup implications (search results, performance, etc.)
  • Let the fragment generator specify a custom placeholder with a custom TTL value. For example, if it knows some computation might take 10 minutes, it might inject an appropriate user message in the placeholder and let the custom TTL eject the page so that a subsequent request to the same page will generate output without a placeholder for this slot.

Related Objects

StatusSubtypeAssignedTask
OpenJdlrobson
OpenCCiufo-WMF
OpenBUG REPORTcscott
Opencscott
OpenJdforrester-WMF
OpenNone
Opencscott
OpenNone
OpenNone
Opencscott
Opencscott
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
Openssastry
ResolvedBUG REPORTihurbain
Resolvedihurbain
OpenBUG REPORTihurbain
OpenBUG REPORTihurbain
OpenBUG REPORTihurbain
OpenNone
OpenNone

Event Timeline

The MVP default would be "don't cache", which already has support in core (ParserOutput::setCacheTime(-1)).

Any cache timeout specified here implies a "pull" model for fragment generation. Another option is a "push" option where we don't mess with the cache lifetime at all for pages with placeholders, and rely on the fragment generator to push an appropriate invalidation once the content is ready in its fragment cache.

Note that we don't have ::setCacheTime() in ContentMetadataCollector, it is just in ParserOutput, so we might have to add something to CMC here.

Change #1084879 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] WIP: Expose "no cache" as a ParserOutputFlag

https://gerrit.wikimedia.org/r/1084879

I proposed NO_CACHE as a ParserOutputFlag for now in the patch above. If we decide to use a TTL or some other mechanism we can expose that via CMC to Parsoid. Maybe we should just expose CacheTime::updateCacheExpiry($seconds) which would allow any of the other proposed mechanisms above. If we use a "short generic TTL" I'd prefer exposing that via a ParserOutputFlag so that the configuration of the exact amount is kept on the MediaWiki side, not hard-coded into Parsoid.

EDIT: I updated the patch to define an explicit ASYNC_NOT_READY flag, allowing mediawiki-core to set the policy for how that is handled. Right now it is handled by making the page uncacheable.

Discussion with @ssastry revealed that ParserOutput::hasReducedExpiry() needs to return true not just for the case where there is async content present, but when async content *was* present on the first run on RefreshLinksJob.

Short term solution is to add two flags: HAS_ASYNC_CONTENT (which makes hasReducedExpiry() return true for now) and ASYNC_NOT_READY (which reduces the cache TTL).

In the future we'll probably need a new bool in the page table, like page_touched and page_links_updated which is set when RefreshLinksJobs is run on a page with the ASYNC_NOT_READY flag set in the ParserOutput. This then triggers the code in WikiPage::triggerOpportunitisticLinksUpdate() to reschedule a new RefreshLinksUpdate job when the page is viewed later:

if ($parserOutput->hasReducedExpiry() || $this->getHadAsyncContent()) { ... }

(The function name in WikiPage should really be "refreshLinksOnDynamicContentUpdate" not "opportunisticLinksUpdate" since running the refresh links job is critical to ensure that the database reflects the dynamic content.)