Page MenuHomePhabricator

page/<title>/html and with_html endpoints fails with InvalidArgumentException for overridden interface messages with JSON content
Open, HighPublicBUG REPORT

Description

Visit this endpoint for a MediaWiki: interface message that's overridden on the local wiki, e.g. https://fr.wikipedia.org/w/rest.php/v1/page/MediaWiki:Editcheck-config.json/html

Observe the following error: {"message":"Error: exception of type InvalidArgumentException","httpCode":500,"httpReason":"Internal Server Error"}

Note that the top-level page API for the message shows its source correctly (https://fr.wikipedia.org/w/rest.php/v1/page/MediaWiki:Editcheck-config.json), unlike in T349677 where that endpoint fails for non-overridden messages.

A wiki with the message not overridden (https://nah.wikipedia.org/w/rest.php/v1/page/Huiquimedia%3AEditcheck-config.json/with_html) does show the with_html endpoint correctly, though not the base one.

Note that this works fine for messages that have wikitext content, e.g. https://fr.wikipedia.org/w/rest.php/v1/page/MediaWiki:Abusefilter/html


Exception message: "ParserOutput does not have a render ID"

Exception message: "Failed to find content language in page bundle data"

Stack trace:

from /srv/mediawiki/php-1.42.0-wmf.20/includes/Rest/Handler/Helper/HtmlOutputRendererHelper.php(614)
#0 /srv/mediawiki/php-1.42.0-wmf.20/includes/Rest/Handler/Helper/HtmlOutputRendererHelper.php(629): MediaWiki\Rest\Handler\Helper\HtmlOutputRendererHelper->getHtmlOutputContentLanguage()
#1 /srv/mediawiki/php-1.42.0-wmf.20/includes/Rest/Handler/PageHTMLHandler.php(137): MediaWiki\Rest\Handler\Helper\HtmlOutputRendererHelper->putHeaders(MediaWiki\Rest\Response, boolean)
#2 /srv/mediawiki/php-1.42.0-wmf.20/includes/Rest/SimpleHandler.php(40): MediaWiki\Rest\Handler\PageHTMLHandler->run(string)

Event Timeline

This should probably be made to work -- I think we restricted Parsoid to just the main namespace at one point but there's no reason we need to keep that. The contentmodel for the MediaWiki namespace is wikitext, isn't it?

See also T349677: MediaWiki:Editcheck-config.json: Invariant failed: Page should be known for the case where the message *isn't* overridden.

daniel renamed this task from page/<title>/html and with_html endpoints fails with InvalidArgumentException for overridden interface messages to page/<title>/html and with_html endpoints fails with InvalidArgumentException for overridden interface messages with JSON content.Oct 30 2023, 6:34 PM
daniel updated the task description. (Show Details)

This seems to only fail for messages that have non-wikitext content. This works fine, even though it's overwritten: https://fr.wikipedia.org/w/rest.php/v1/page/MediaWiki:Abusefilter/html

I have edited the task descrciption accordingly.

Analysis:

The immediate cause of the error is this:

  1. HtmlOutputRendererHelper::getETag() calls getParserOutput(), and then calls ParsoidOutputAccess::getParsoidRenderID() on the ParserOutput object returned from getParserOutput().
  2. ParsoidOutputAccess::getParsoidRenderID() calls getParsoidRenderId() on that ParserOutput, but gets null and throws an InvalidArgumentException.

Going back to the call to HtmlOutputRendererHelper::getParserOutput() to see where the ParserOutput comes from, and why it doesn't have a render ID:

  1. HtmlOutputRendererHelper::getParserOutput() calls getParserOutputInternal(), which in turn calls ParsoidOutputAccess::getParserOutput()
  2. ParsoidOutputAccess::getParserOutput() calls handleUnsupportedContentModel() to see whetehr we support the input's model, which is JSON. This returns null, indicating that we do support JSON input.
  3. This is because handleUnsupportedContentModel() calls supportsContentModel(), which in turn ends up calling SiteConfig::getContentModelHandler(), which returns a handler for JSON.
  4. Next, ParsoidOutputAccess::getParserOutput() calls ParserOutputAccess::getParserOutput(), which relies on the ContentHandler stack to generate ParserOutput for the JsonContent.
  5. But JsonContentHandler does not use ParsoidParser, so it doesn't call setParsoidRenderId() on the ParserOutput.

Possible solutions:

  1. Have supportsContentModel() return false for JSON. This would cause the request to fail with a 4xx response instead of a 500 response.
  2. Have getETag be lenient and return nothing if there is no render ID
  3. Set a render ID on all output (see I72c5e6f86b and T350538)
  4. Make all ContentHandlers for models that are supported by Parsoid use ParsoidParser if useParsoid is set in ParserOptions.

Note that currently, the page/{title}/html endpoint only supports content that is handled by Parsoid, but for non-wikitext models, it will return output not rendered by parsoid (or would, if it wasn't for the InvalidArgumentException). This seems odd, we should just support all content models (perhaps we should re-open T311728). To achieve this, we should probably do at least (3). But then ParsoidPutputAccess will return output that wasn't generated by Parsoid, and getParsoidRenderId will return an ID that doesn't come from parsoid... We need to come up with a proper solution for the handleUnsupportedContentModel logic.

daniel triaged this task as High priority.

This should be fixed by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/957773 which moves setting the render ID into the ContentHandler stack (solution #3 of @daniel's message above).

Change 957773 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/core@master] Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer

https://gerrit.wikimedia.org/r/957773

Change 957773 merged by jenkins-bot:

[mediawiki/core@master] Add ParserOutput::{get,set}RenderId() and set render id in ContentRenderer

https://gerrit.wikimedia.org/r/957773

I can't reproduce this locally anymore. But it still happens in production.
The exception is caused in a different place now, I will update the task description.

Change 1008538 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] HtmlOutputRendererHelper: fall back to page language

https://gerrit.wikimedia.org/r/1008538

Change 1008538 merged by jenkins-bot:

[mediawiki/core@master] HtmlOutputRendererHelper: fall back to page language

https://gerrit.wikimedia.org/r/1008538