UnexpectedValueException: Parsoid does not support content model proofread-index
Closed, ResolvedPublicPRODUCTION ERROR
Actions

Assigned To

Authored By

	TheresNoTime
	Dec 7 2022, 9:08 PM

Description

Error

mwversion: 1.40.0-wmf.13
reqId: 661bb23d-5c6f-4772-bf29-e520a347f9bd
Find reqId in Logstash

normalized_message

[{reqId}] {exception_url}   UnexpectedValueException: Parsoid does not support content model proofread-index

exception.trace

from /srv/mediawiki/php-1.40.0-wmf.13/includes/parser/Parsoid/ParsoidOutputAccess.php(196)
#0 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/Handler/HtmlOutputRendererHelper.php(535): MediaWiki\Parser\Parsoid\ParsoidOutputAccess->getParserOutput(MediaWiki\Page\PageStoreRecord, ParserOptions, MediaWiki\Revision\RevisionStoreRecord, integer)
#1 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/Handler/HtmlOutputRendererHelper.php(635): MediaWiki\Rest\Handler\HtmlOutputRendererHelper->getParserOutput()
#2 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/Handler/ParsoidHandler.php(881): MediaWiki\Rest\Handler\HtmlOutputRendererHelper->getPageBundle()
#3 /srv/mediawiki/php-1.40.0-wmf.13/vendor/wikimedia/parsoid/extension/src/Rest/Handler/PageHandler.php(92): MediaWiki\Rest\Handler\ParsoidHandler->wt2html(MediaWiki\Parser\Parsoid\Config\PageConfig, array)
#4 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/Router.php(487): MWParsoid\Rest\Handler\PageHandler->execute()
#5 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/Router.php(406): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#6 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/EntryPoint.php(191): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#7 /srv/mediawiki/php-1.40.0-wmf.13/includes/Rest/EntryPoint.php(131): MediaWiki\Rest\EntryPoint->execute()
#8 /srv/mediawiki/php-1.40.0-wmf.13/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#9 /srv/mediawiki/w/rest.php(3): require(string)
#10 {main}

Impact

Notes

> 4000 in the last hour

Details

Request URL: https://it.wikisource.org/w/rest.php/it.wikisource.org/v3/page/pagebundle/Indice%3ATolstoj_-_Come_ruinare_l%27autorit%C3%A0.djvu/2582361

Subject	Repo	Branch	Lines +/-
ParsoidOutputAccess: generate dummy output for unsupported models.	mediawiki/core	master	+38 -22
ParsoidOutputAccess: only cache output for wikitext	mediawiki/core	master	+29 -0
Throw a 400 when asking parsoid to render an unknown content model.	mediawiki/core	master	+12 -7
Make parsoid accept all content models.	mediawiki/core	wmf/1.40.0-wmf.13	+12 -0
Make parsoid accept all content models.	mediawiki/core	master	+12 -0

Customize query in gerrit

Related Objects

Mentioned In: T330772: Mobile-html switchover to REST API (from RESTBase) should account for http 400 response for non-wikitext content models
T313265: App gives no indication that user script pages are protected when reading them
T192687: Better formatting in REST API results for wikidata
T325137: UnexpectedValueException: Invalid version string ""
T246403: Lint error counts on "Page information" page do not update, even with null edit
T311728: Gracefully handle non-wikitext pages when returning HTML from the page endpoints
T320518: 1.40.0-wmf.13 deployment blockers
Mentioned Here: T317018: Make HtmlOutputRendererHelper use ParserOutputAccess
T330772: Mobile-html switchover to REST API (from RESTBase) should account for http 400 response for non-wikitext content models
T246403: Lint error counts on "Page information" page do not update, even with null edit
T311728: Gracefully handle non-wikitext pages when returning HTML from the page endpoints

Event Timeline

TheresNoTime created this task.Dec 7 2022, 9:08 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 7 2022, 9:08 PM

TheresNoTime updated the task description. (Show Details)Dec 7 2022, 9:09 PM

TheresNoTime updated the task description. (Show Details)

Started about an hour ago, around the time the train rolled..

TheresNoTime mentioned this in T320518: 1.40.0-wmf.13 deployment blockers.Dec 7 2022, 9:16 PM

UBN!, I think this needs some attention irt. rolling back the train if that was indeed the cause

Ugh... this may be my fault... HtmlOutputRendererHelper is now on this code path. It may not know how to handle proofread-index.

TheresNoTime added a project: Parsoid.Dec 7 2022, 9:20 PM

So, HtmlOutputRendererHelper throws because ParsoidOutputAccess::supportsContentModel() returns false for "proofread-index". Is this correct? I don't know...

The code is:

	/**
	 * @param string $model
	 *
	 * @return bool
	 */
	public function supportsContentModel( string $model ): bool {
		if ( $model === CONTENT_MODEL_WIKITEXT ) {
			return true;
		}

		return $this->siteConfig->getContentModelHandler( $model ) !== null;
	}

SiteConfig::getContentModelHandler() in turn calls $env->getContentHandler( $contentmodel ). Which apparently doesn't know about "proofread-index". I can't find any mention of it in the Parsoid repo. Parsoid has an extension mechanism for registerign handlers for different models, but I can't find any mention of that in the ProofreadPage repo.

So, how did this use to work? Did it ever work? Did we just pretend it's wikitext?

So one bug here is: asking for an unexpected data model shouldn't cause a 500. It should cause a 400 or 406. I'll get to work fixing that.

I'm pretty sure this was caused by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/864723. I'd much rather fix it than revert, though.

My current thinking is that the parsoid endpoints should return a 400 for unsupported content models. That would at least remove the log spam. Unless restbase starts logging the status 400 responses.

daniel mentioned this in T311728: Gracefully handle non-wikitext pages when returning HTML from the page endpoints.Dec 7 2022, 9:53 PM

@daniel I was about to roll back the train but I can wait if you think you can resolve the issue in short time. It's generating a ton of errors.

I have put together a quick and dirty fix that changes this from a 500 to a 400, so we don't spam the log with it.
The fact remains that parsoid can't render pages of this type. And when we ask it to do so, the result is either an error or garbage.
We should find out what asks for the rendering, and why.
It's probably changeprop/restbase blindly asking for a rendering of everything, regardless of content model.

In T324711#8452443, @dancy wrote:

@daniel I was about to roll back the train but I can wait if you think you can resolve the issue in short time. It's generating a ton of errors.

I will have a patch up in a minute

In T324711#8452513, @daniel wrote:

In T324711#8452443, @dancy wrote:

@daniel I was about to roll back the train but I can wait if you think you can resolve the issue in short time. It's generating a ton of errors.

I will have a patch up in a minute

Great! I will leave things in your capable hands.

Change 865784 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Throw a 400 when asking parsoid to render an unknown content model.

https://gerrit.wikimedia.org/r/865784

gerritbot added a project: Patch-For-Review.Dec 7 2022, 10:13 PM

This is not just proofread-page, but basically all non-wikitext content modles. "proofreadpage-index", "flow-board", "javascript", "wikibase-item", "sanitize-css", etc. So, RB (or whiever upstream content handler it is) shouldn't be forwarding these to Parsoid. But for now, we should just fix ParsoidOutputAccess to pretend these are all wkitext content models.

HTTP 400 is reasonable, but I am not sure how / what RESTBase will do with all these 400s .. hence my conservative caution since we don't have any RB folks here.

Change 865785 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Make parsoid accept all content models.

https://gerrit.wikimedia.org/r/865785

In T324711#8452541, @gerritbot wrote:

Change 865785 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] Make parsoid accept all content models.

https://gerrit.wikimedia.org/r/865785

This is the important fix. It will restore the previous behavior: let parsoid blindly render anything, even if the result is garbage.

The patch for turning the 500 into a 400 isn't really relevant, just nice to have.

Change 865785 merged by jenkins-bot:

[mediawiki/core@master] Make parsoid accept all content models.

https://gerrit.wikimedia.org/r/865785

Change 865749 had a related patch set uploaded (by Samtar; author: Daniel Kinzler):

[mediawiki/core@wmf/1.40.0-wmf.13] Make parsoid accept all content models.

https://gerrit.wikimedia.org/r/865749

Change 865784 merged by jenkins-bot:

[mediawiki/core@master] Throw a 400 when asking parsoid to render an unknown content model.

https://gerrit.wikimedia.org/r/865784

Mentioned in SAL (#wikimedia-operations) [2022-12-07T22:48:31Z] <TheresNoTime> Going to backport [[gerrit:865749]] to wmf/1.40.0-wmf.13 for T324711

Change 865749 merged by jenkins-bot:

[mediawiki/core@wmf/1.40.0-wmf.13] Make parsoid accept all content models.

https://gerrit.wikimedia.org/r/865749

Mentioned in SAL (#wikimedia-operations) [2022-12-07T23:00:48Z] <samtar@deploy1002> Started scap: Backport for [[gerrit:865749|Make parsoid accept all content models. (T324711)]]

ReleaseTaggerBot added a project: MW-1.40-notes (1.40.0-wmf.14; 2022-12-12).Dec 7 2022, 11:01 PM

Mentioned in SAL (#wikimedia-operations) [2022-12-07T23:02:45Z] <samtar@deploy1002> samtar and samtar: Backport for [[gerrit:865749|Make parsoid accept all content models. (T324711)]] synced to the testservers: mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet

Mentioned in SAL (#wikimedia-operations) [2022-12-07T23:14:46Z] <samtar@deploy1002> Finished scap: Backport for [[gerrit:865749|Make parsoid accept all content models. (T324711)]] (duration: 13m 57s)

The deployment of this patch has resolved the exceptions.

Screenshot 2022-12-07 at 18.18.58.png (1×2 px, 589 KB)

Noting this comment, this patch should be undone(?) once T311728: Gracefully handle non-wikitext pages when returning HTML from the page endpoints is completed.

Maintenance_bot removed a project: Patch-For-Review.Dec 7 2022, 11:30 PM

Agusbou2015 subscribed.Dec 8 2022, 12:14 AM

Change 866146 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParsoidOutputAccess: only cache output for wikitext

https://gerrit.wikimedia.org/r/866146

gerritbot added a project: Patch-For-Review.Dec 8 2022, 7:08 AM

Change 866250 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParsoidOutputAccess: generate dummy output for unsupported models.

https://gerrit.wikimedia.org/r/866250

daniel triaged this task as High priority.Dec 8 2022, 7:59 AM

daniel added a project: RESTBase Sunsetting.

Arlolra mentioned this in T246403: Lint error counts on "Page information" page do not update, even with null edit.Dec 9 2022, 5:20 PM

Change 866146 merged by jenkins-bot:

[mediawiki/core@master] ParsoidOutputAccess: only cache output for wikitext

https://gerrit.wikimedia.org/r/866146

Change 866250 merged by jenkins-bot:

[mediawiki/core@master] ParsoidOutputAccess: generate dummy output for unsupported models.

https://gerrit.wikimedia.org/r/866250

Maintenance_bot removed a project: Patch-For-Review.Dec 12 2022, 8:30 PM

daniel mentioned this in T325137: UnexpectedValueException: Invalid version string "".Dec 14 2022, 10:01 AM

daniel closed this task as Resolved.Dec 14 2022, 12:20 PM

daniel moved this task from Unsorted to PCS Service Pile on the RESTBase Sunsetting board.

LucasWerkmeister mentioned this in T192687: Better formatting in REST API results for wikidata.Feb 25 2023, 7:52 PM

cscott mentioned this in T313265: App gives no indication that user script pages are protected when reading them.Feb 28 2023, 3:27 PM

ssastry mentioned this in T330772: Mobile-html switchover to REST API (from RESTBase) should account for http 400 response for non-wikitext content models.Feb 28 2023, 4:28 PM

@cscott @ssastry Really sure this is fixed? Assuming this has been deployed, this is still failing https://id.wikipedia.org/api/rest_v1/page/html/Story%3AWisata_Wajib_di_Toraja. Actually this bug seems to be the reason why we can not generated ZIM offline snapshot of id.wikipedia.org since two quarters. More details about the bug at MWoffliner level at https://github.com/openzim/mwoffliner/issues/1853.

Kelson added a project: affects-Kiwix-and-openZIM.Jun 10 2023, 2:14 PM

dancy unsubscribed.Jun 12 2023, 2:46 PM

@Kelson Yeah, the resolution from T246403 was that Parsoid

should support all models that serialize to wikitext

https://github.com/wikimedia/mediawiki/commit/e1c3af91774818f512cfa3acf594242624c2f778

You're seeing,

Parsoid does not support content model story

That page has "contentmodel": "story",
https://id.wikipedia.org/w/api.php?action=query&titles=Story:Wisata_Wajib_di_Toraja&prop=info

From the Wikistories Extension, story is an alias for the json contentmodel, which is not something that needs wikitext parsing.

$wgContentHandlers[ 'story' ] = JsonContentHandler::class;

https://github.com/wikimedia/mediawiki-extensions-Wikistories/blob/master/README#L14

@Arlolra Thank you for your technical explanation but I have a hard time to use it to answer my questions which are:

Is the "Story" namespace really only a JSON, so the JsonContentHandler is appropriate?
If this is JSON, is that really appropriate to have this namespace in https://www.mediawiki.org/wiki/Manual:$wgContentNamespaces?
Is that normal ("it's a feature, not a bug") that https://id.wikipedia.org/api/rest_v1/page/html/Story%3AWisata_Wajib_di_Toraja returns an error and not the corresponding HTML (like for end-users)?
If this is normal/wanted that no HTML is returned, then is that appropriate to have a link pointing to https://phabricator.wikimedia.org/T324711 (seems a bit unclear to me).

In T324711#8938177, @Kelson wrote:.

@Arlolra Thank you for your technical explanation but I have a hard time to use it to answer my questions which are:

Is the "Story" namespace really only a JSON, so the JsonContentHandler is appropriate?

Not quite - StoryContentHandler is a subclass of JsonContentHandler. So it "is a" JsonContentHandler, but more than "just" a JsonContentHandler.

If this is JSON, is that really appropriate to have this namespace in https://www.mediawiki.org/wiki/Manual:$wgContentNamespaces?

Sure, why not? Wikidata for example has no wikitext in the main namespace at all. "Content" doesn't have to mean "text".

Is that normal ("it's a feature, not a bug") that https://id.wikipedia.org/api/rest_v1/page/html/Story%3AWisata_Wajib_di_Toraja returns an error and not the corresponding HTML (like for end-users)?

It's a missing feature, not a bug as such. It's being worked on, see T330772 and T317018. It will be fixed in a couple of months I hope.

If this is normal/wanted that no HTML is returned, then is that appropriate to have a link pointing to https://phabricator.wikimedia.org/T324711 (seems a bit unclear to me).

The tickets I mentioned above are probably more informative.

@daniel Thank you! Things are clearer now for me. At the core of my problem is that we have always been using $wgContentNamespaces to know which namespaces should be scraped... and here it does not work that well (anymore).

	F35845304: Screenshot 2022-12-07 at 18.18.58.png
	Dec 7 2022, 11:24 PM

UnexpectedValueException: Parsoid does not support content model proofread-indexClosed, ResolvedPublicPRODUCTION ERRORActions