Page MenuHomePhabricator

Investigate integration of FlaggedRevs ParserCache into ParserOutputAccess service
Closed, ResolvedPublic

Description

Questions to answer:

  • Does WikiPage->getParserOutput pull from the independent FlaggedRevs parser output cache when retrieving content for the stable revision (if not the same as the latest revision)?
    • No, there is no integration. Browser pageviews correctly pull content from the main ParserCache or the FlaggedRevs ParserCache depending on whether the latest or stable revision is viewed. But currently, the mapdata API causes an additional parse and uses the RevisionOutputCache when fetching mapdata for the non-latest stable revision.
  • If not, how difficult would it be for the FlaggedRevs extension to enhance this service to provide parsed stable revision content when applicable?
    • Proof of concept code is already written which accomplishes this, so I've switched to attempting a full implementation.

Event Timeline

awight claimed this task.
awight moved this task from Sprint Backlog to Doing on the WMDE-TechWish-Sprint-2022-03-16 board.
awight moved this task from Ready for pickup to In sprint on the WMDE-GeoInfo-FocusArea board.
awight added a subscriber: A.Wiki1.

The first part of the investigation confirms that an API implementation cannot easily access the FlaggedRevs ParserCache for stable revisions, instead it would need to parse the page and store output in RevisionOutputCache. In T295363 we learned that the mapdata endpoint gets roughly 3-4M requests per day, so comparing the relative cache hit rates for ParserCache (80%) vs RevisionOutputCache (20%), adding versioned maps the status quo would cause up to 3.5M x 60% = c. 2M additional page parses per day (assuming a worst-case of every page having a different stable vs. latest revision) as well as taking up unnecessary space in the RevisionOutputCache. To do nothing is not an option.

It should be possible for the API to tap into the FlaggedRevs cache, but I don't see any clean and simple approach. Here are some thoughts about alternative ways of doing this:

  • Hardcode FlaggedRevs cache access in mapdata API - a one-off approach, the Kartographer extension can peek into the FR cache to see whether the requested revision is already available. If so, it fetches that ParserOutput. What to do if the revision is not available becomes more complex, we could fall back to RevisionOutputCache or we could duplicate FlaggedRevs logic to parse and populate the stable revision cache. This is probably a safe option, but creates new technical debt.
  • Refactor FlaggedRevs to provide cache access - the cache-peeking and -population above could be done in the FlaggedRevs extension. Although Kartographer would still have a hardcoded integration with FR, the code would be extracted from the FlaggedRevs and FlaggablePageView classes and become moderately reusable. This improves some FR technical debt, but risks destabilizing a core feature of the extension.
  • Refactor ParserOutputAccess to be hookable by FlaggedRevs - a clean solution, but possibly destabilizes both the parser cache and FlaggedRevs. We would make it possible for ParserOutputAccess to query extensions for help in finding a cached ParserOutput, and when finished there would be a switch which can select ParserCache for latest revisions, FR ParserCache for stable revisions, and RevisionOutputCache for others. FR would probably be refactored to go through the same ParserOutputAccess mechanism, and the existing switching logic would be pushed down to that level.

Our next step will be to attempt a proof-of-concept implementation of the "nice" option above, "Refactor ParserOutputAccess to be hookable by FlaggedRevs". I'll continue the work here in this task.

My first impression is that the new hook should go in ParserOutputAccess->getParserOutput here, right before attempting getCachedParserOutput and the call would look something like (pass-by-reference notated for emphasis):

$this->hookRunner->onParserOutputAccessGet( $page, $parserOptions, $revision, &$parserOutput );

The extension hook would be able to return a ParserOutput object from its own cache, but not affect the function in any other way. If the hook succeeds, getParserOutput returns this result and nothing else happens.

I don't like that FlaggedRevs has a separate pool counter and would still do the rendering using its own code path, but integrating at this level avoids deeper changes to ParserOutputAccess and FlaggedRevs. The other drawback is that ParserOutputAccess->getCachedParserOutput which is also a public function would not benefit from the hook—at the moment there is only one caller, and it falls back to getParserOutput so this will be harmless, but in the future other callers might be added.

Change 773361 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/core@master] [WIP] New hooks to divert ParserOutputAccess

https://gerrit.wikimedia.org/r/773361

Change 773446 had a related patch set uploaded (by Awight; author: Awight):

[mediawiki/extensions/FlaggedRevs@master] [WIP] Implement multi-cache hooks

https://gerrit.wikimedia.org/r/773446

awight updated the task description. (Show Details)
awight moved this task from Doing to Done on the WMDE-TechWish-Sprint-2022-03-16 board.