Page MenuHomePhabricator

Accept headers aren't respected when returning from parsercache after bumping parsoid output version
Closed, ResolvedPublic

Description

When parsoid bumps its output version, say from 2.7.0 to 2.8.0, the return value of Parsoid::defaultHTMLVersion() will have changed from when previous entries were entered into the parsercache.

A request for -H "Accept: text/html; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/HTML/2.8.0\"" should force the return of 2.8.0 content, however, because 2.8.0 is the default version now, it's deemed acceptable to return content from the parsercache and, if found, 2.7.0 content will be returned, which isn't semantically correct.

In other words, HtmlOutputRendererHelper->isCacheable is sufficient for telling when we want to enter something into the cache, as is, but some other signal is needed to evict stale entries when the default changes.

Note that this is only necessary when the new default version is explicitly requested. Parsoid::resolveContentVersion might return the new default version but returning the cached content was still fine. For example, 2.6.0 resolved to 2.8.0 but 2.7.0 is still fine to serve. The point of this note is to say that, for the most part, after a version bump, requests are likely to either not have an explicit accept header or still be requesting an older version that's semantically equivalent to what's cached, and we want to continue to serve that, despite Parsoid::resolveContentVersion resolve to the new default.

Event Timeline

ssastry triaged this task as Medium priority.Apr 6 2023, 2:26 PM

Idea for a fix:

  1. HtmlOutputRendererHelper::setOutputProfileVersion needs to always remember which profile version was requested, even if the requested version was the default.
  2. HtmlOutputRendererHelper::getParserOutputInternal needs to check if the ParserOutput returned by ParsoidOutputAccess::getParserOutput() matches the expected profile version. If not, it needs to call getParserOutput() again, with the OPT_FORCE_PARSE flag set.

In https://gerrit.wikimedia.org/r/c/mediawiki/extensions/VisualEditor/+/923691/1#message-fe59d1027c910eadd569f8a42cb5fbff2016449c I wrote,

  1. parsercache doesn't respect these version numbers (outlined in https://phabricator.wikimedia.org/T333606). I'm a little worried that there's going to be consequences to that. Like, VE asks restbase for 2.8.0, parsoid returns 2.7.1 because that's what's in the parsercache, restbase looks at the response and says, no, that's not semantically correct and returns an error. I need to read the restbase code

It would be helpful if we got this fixed so it doesn't have to be reasoned about

daniel raised the priority of this task from Medium to High.May 26 2023, 7:57 PM
daniel added a project: RESTBase Sunsetting.

Change 924119 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Ensure the version returned from parserCache respects accept header

https://gerrit.wikimedia.org/r/924119

PS3 of the above patch fixes things when an explicit version is requested of Parsoid. However, when trying to confirm that this would be an issue for RESTBase in T337596#8887234, I discovered the situation is slightly different.

The flow described in this bug is: Parsoid receives a request for an explicit 2.8.0, it has 2.7.0 in cache and returns that erroneously.

However, in practice, VE would request 2.8.0 of RESTBase. RESTBase checks its storage and sees it only has 2.7.0, which doesn't satisfy the request. It then makes a query to Parsoid for the default version (note that it doesn't set an accept header) . If the default satisfies the request, RESTBase returns that. Alas, if the cache returns 2.7.0 as described in this bug, we don't have an explicit version to nullify that in Parsoid.

So, I think the simplest thing might be to always check the version coming from cache and confirm that it's still the default.

Change 925916 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] [WIP] Move parsoid version cache invalidation to lower level

https://gerrit.wikimedia.org/r/925916

Change 924119 merged by jenkins-bot:

[mediawiki/core@master] Force a reparse if output from cache is not Parsoid's default

https://gerrit.wikimedia.org/r/924119

Change 927307 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/core@master] Set a render reason when forcing a reparse

https://gerrit.wikimedia.org/r/927307

Change 927307 merged by jenkins-bot:

[mediawiki/core@master] Set a render reason when forcing a reparse

https://gerrit.wikimedia.org/r/927307

The following graphs show rerenders due to 'not-parsoid-default',

I see no data for 'not-parsoid-default', even after switching from "parsoid" to "parsoid_pcache" and looking at the last 6 months. So I guess this just doesn't happen?

There hasn't been a spec bump in Parsoid since around June 2023 when that was posted. This would only happen when the cache contained the previous version after a spec bump. The parser cache expires in 15 days? anyways.

Screenshot 2024-04-26 at 11.18.09 AM.png (2×2 px, 1 MB)

There hasn't been a spec bump in Parsoid since around June 2023 when that was posted. This would only happen when the cache contained the previous version after a spec bump. The parser cache expires in 15 days? anyways.

Ah, right, that makes sense. Thanks!

Change #925916 merged by jenkins-bot:

[mediawiki/core@master] Move parsoid version cache invalidation to lower level

https://gerrit.wikimedia.org/r/925916