Page MenuHomePhabricator

Error: Couldn't fetch Wikimedia\Parsoid\DOM\Element
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error
normalized_message
[{reqId}] {exception_url}   Error: Couldn't fetch Wikimedia\Parsoid\DOM\Element
FrameLocationCall
from/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Utils/DOMCompat.php(479)
#0/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Utils/DOMCompat.php(479)DOMElement->hasAttribute(string)
#1/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Utils/DOMDataUtils.php(167)Wikimedia\Parsoid\Utils\DOMCompat::getAttribute(Wikimedia\Parsoid\DOM\Element, string)
#2/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Utils/DOMDataUtils.php(375)Wikimedia\Parsoid\Utils\DOMDataUtils::getNodeData(Wikimedia\Parsoid\DOM\Element)
#3/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Ext/DOMDataUtils.php(41)Wikimedia\Parsoid\Utils\DOMDataUtils::getDataMw(Wikimedia\Parsoid\DOM\Element)
#4/srv/mediawiki/php-1.44.0-wmf.28/extensions/Cite/src/Parsoid/RefGroup.php(105)Wikimedia\Parsoid\Ext\DOMDataUtils::getDataMw(Wikimedia\Parsoid\DOM\Element)
#5/srv/mediawiki/php-1.44.0-wmf.28/extensions/Cite/src/Parsoid/References.php(701)Cite\Parsoid\RefGroup->renderLine(Wikimedia\Parsoid\Ext\ParsoidExtensionAPI, Wikimedia\Parsoid\DOM\Element, Cite\Parsoid\RefGroupItem)
#6/srv/mediawiki/php-1.44.0-wmf.28/extensions/Cite/src/Parsoid/CiteDocumentPostProcessor.php(76)Cite\Parsoid\References->insertReferencesIntoDOM(Wikimedia\Parsoid\Ext\ParsoidExtensionAPI, Wikimedia\Parsoid\DOM\Element, Cite\Parsoid\ReferencesData, bool)
#7/srv/mediawiki/php-1.44.0-wmf.28/extensions/Cite/src/Parsoid/CiteDocumentPostProcessor.php(33)Cite\Parsoid\CiteDocumentPostProcessor->insertMissingReferencesIntoDOM(Wikimedia\Parsoid\Ext\ParsoidExtensionAPI, Cite\Parsoid\ReferencesData, Wikimedia\Parsoid\DOM\Element)
#8/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wt2Html/DOM/Processors/RunExtensionProcessors.php(104)Cite\Parsoid\CiteDocumentPostProcessor->wtPostprocess(Wikimedia\Parsoid\Ext\ParsoidExtensionAPI, Wikimedia\Parsoid\DOM\Element, array)
#9/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wt2Html/DOMProcessorPipeline.php(140)Wikimedia\Parsoid\Wt2Html\DOM\Processors\RunExtensionProcessors->run(Wikimedia\Parsoid\Config\Env, Wikimedia\Parsoid\DOM\Element, array, bool)
#10/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wt2Html/DOMProcessorPipeline.php(171)Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline->doPostProcess(Wikimedia\Parsoid\DOM\Element)
#11/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wt2Html/DOMProcessorPipeline.php(189)Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline->process(Wikimedia\Parsoid\DOM\Element, array)
#12/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipeline.php(164)Wikimedia\Parsoid\Wt2Html\DOMProcessorPipeline->processChunkily(string, array)
#13/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wt2Html/ParserPipelineFactory.php(604)Wikimedia\Parsoid\Wt2Html\ParserPipeline->parseChunkily(string, array)
#14/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Wikitext/ContentModelHandler.php(185)Wikimedia\Parsoid\Wt2Html\ParserPipelineFactory->parse(string)
#15/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Parsoid.php(198)Wikimedia\Parsoid\Wikitext\ContentModelHandler->toDOM(Wikimedia\Parsoid\Ext\ParsoidExtensionAPI, null)
#16/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/src/Parsoid.php(264)Wikimedia\Parsoid\Parsoid->parseWikitext(MediaWiki\Parser\Parsoid\Config\PageConfig, MediaWiki\Parser\ParserOutput, array, null)
#17/srv/mediawiki/php-1.44.0-wmf.28/includes/parser/Parsoid/ParsoidParser.php(166)Wikimedia\Parsoid\Parsoid->wikitext2html(MediaWiki\Parser\Parsoid\Config\PageConfig, array, null, MediaWiki\Parser\ParserOutput)
#18/srv/mediawiki/php-1.44.0-wmf.28/includes/parser/Parsoid/ParsoidParser.php(301)MediaWiki\Parser\Parsoid\ParsoidParser->genParserOutput(MediaWiki\Parser\Parsoid\Config\PageConfig, MediaWiki\Parser\ParserOptions, MediaWiki\Parser\ParserOutput)
#19/srv/mediawiki/php-1.44.0-wmf.28/includes/content/WikitextContentHandler.php(380)MediaWiki\Parser\Parsoid\ParsoidParser->parse(string, MediaWiki\Title\Title, MediaWiki\Parser\ParserOptions, bool, bool, int, MediaWiki\Parser\ParserOutput)
#20/srv/mediawiki/php-1.44.0-wmf.28/includes/content/ContentHandler.php(1693)MediaWiki\Content\WikitextContentHandler->fillParserOutput(MediaWiki\Content\WikitextContent, MediaWiki\Content\Renderer\ContentParseParams, MediaWiki\Parser\ParserOutput)
#21/srv/mediawiki/php-1.44.0-wmf.28/includes/content/Renderer/ContentRenderer.php(75)MediaWiki\Content\ContentHandler->getParserOutput(MediaWiki\Content\WikitextContent, MediaWiki\Content\Renderer\ContentParseParams)
#22/srv/mediawiki/php-1.44.0-wmf.28/includes/Revision/RenderedRevision.php(261)MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(MediaWiki\Content\WikitextContent, MediaWiki\Page\PageIdentityValue, MediaWiki\Revision\RevisionStoreRecord, MediaWiki\Parser\ParserOptions, array)
#23/srv/mediawiki/php-1.44.0-wmf.28/includes/Revision/RenderedRevision.php(233)MediaWiki\Revision\RenderedRevision->getSlotParserOutputUncached(MediaWiki\Content\WikitextContent, array)
#24/srv/mediawiki/php-1.44.0-wmf.28/includes/Revision/RevisionRenderer.php(236)MediaWiki\Revision\RenderedRevision->getSlotParserOutput(string, array)
#25/srv/mediawiki/php-1.44.0-wmf.28/includes/Revision/RevisionRenderer.php(169)MediaWiki\Revision\RevisionRenderer->combineSlotOutput(MediaWiki\Revision\RenderedRevision, MediaWiki\Parser\ParserOptions, array)
#26/srv/mediawiki/php-1.44.0-wmf.28/includes/Revision/RenderedRevision.php(196)MediaWiki\Revision\RevisionRenderer->MediaWiki\Revision\{closure}(MediaWiki\Revision\RenderedRevision, array)
#27/srv/mediawiki/php-1.44.0-wmf.28/includes/page/ParserOutputAccess.php(458)MediaWiki\Revision\RenderedRevision->getRevisionParserOutput()
#28/srv/mediawiki/php-1.44.0-wmf.28/includes/page/ParserOutputAccess.php(368)MediaWiki\Page\ParserOutputAccess->renderRevision(MediaWiki\Page\PageStoreRecord, MediaWiki\Parser\ParserOptions, MediaWiki\Revision\RevisionStoreRecord, int, MediaWiki\Parser\ParserOutput)
#29/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Handler/Helper/HtmlOutputRendererHelper.php(937)MediaWiki\Page\ParserOutputAccess->getParserOutput(MediaWiki\Page\PageStoreRecord, MediaWiki\Parser\ParserOptions, MediaWiki\Revision\RevisionStoreRecord, int)
#30/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Handler/Helper/HtmlOutputRendererHelper.php(669)MediaWiki\Rest\Handler\Helper\HtmlOutputRendererHelper->getParserOutputInternal()
#31/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Handler/Helper/HtmlOutputRendererHelper.php(763)MediaWiki\Rest\Handler\Helper\HtmlOutputRendererHelper->getParserOutput()
#32/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Handler/ParsoidHandler.php(739)MediaWiki\Rest\Handler\Helper\HtmlOutputRendererHelper->getPageBundle()
#33/srv/mediawiki/php-1.44.0-wmf.28/vendor/wikimedia/parsoid/extension/src/Rest/Handler/PageHandler.php(90)MediaWiki\Rest\Handler\ParsoidHandler->wt2html(MediaWiki\Parser\Parsoid\Config\PageConfig, array)
#34/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Module/Module.php(416)MWParsoid\Rest\Handler\PageHandler->execute()
#35/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Module/Module.php(299)MediaWiki\Rest\Module\Module->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#36/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Router.php(485)MediaWiki\Rest\Module\Module->execute(string, MediaWiki\Rest\RequestFromGlobals)
#37/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/Router.php(444)MediaWiki\Rest\Router->doExecute(string, MediaWiki\Rest\RequestFromGlobals)
#38/srv/mediawiki/php-1.44.0-wmf.28/includes/Rest/EntryPoint.php(209)MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#39/srv/mediawiki/php-1.44.0-wmf.28/includes/MediaWikiEntryPoint.php(202)MediaWiki\Rest\EntryPoint->execute()
#40/srv/mediawiki/php-1.44.0-wmf.28/rest.php(39)MediaWiki\MediaWikiEntryPoint->run()
#41/srv/mediawiki/w/rest.php(3)require(string)
#42{main}
Impact
Notes

Seems new in T393416

Event Timeline

This feels like a transient error to me, was this during a deploy? Higher up in the stack trace we'd already loaded Wikimedia\Parsoid\DOM\Element to check signatures (ie stack trace entry #10) so it seems unusual that it would fail further down the call stack.

ABreault-WMF added a subscriber: thiemowmde.

Looking at the logs in the past month
https://logstash.wikimedia.org/goto/e30c32396285439876df762513d74f49

It started with the deploys that went out on May 8th

Parsoid had a fairly large deployment
https://www.mediawiki.org/wiki/Parsoid/Deployments#May_6_-_May_8_V0.21.0-a28_as_part_of_1.44.0-wmf.28
but nothing really stands out there

However, the error originates in Cite and most likely from these commits
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/1135965
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/1135970

Those commits are refactoring the flags that indicate whether content ends up in the dom or stashed in data-mw

Presumably the code here is trying to access an element that was part of html that got stashed and left a dangling pointer to a node

However, the error originates in Cite and most likely from these commits
...
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/1135970

Picking a crasher /w/rest.php/v1/page/Talk%3AFour_Noble_Truths%2FArchive_3/with_html
https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-deploy-1-7.0.0-1-2025.05.19?id=fgf455YBfOjk-Vo1x4xr

I isolated a snippet

<ref name="thefourtruths" group="lower-alpha">asfd</ref>

{{#tag:ref|
* Bhikkhu Bodhi <ref>123</ref>
|name="thefourtruths"|group="lower-alpha"}}

and git bisect points to the culprit c7588786303f602da150712a2621362263497bf8

Change #1151063 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Revert (possibly) wrong "in indicator context" logic

https://gerrit.wikimedia.org/r/1151063

Change #1151063 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Cite@master] Revert (possibly) wrong "in indicator context" logic

Reason:

No, that's not the problem.

https://gerrit.wikimedia.org/r/1151063

Change #1151065 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Partly revert and fix untested code path for nested <ref>

https://gerrit.wikimedia.org/r/1151065

Change #1151063 restored by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Cite@master] Revert (possibly) wrong "in indicator context" logic

https://gerrit.wikimedia.org/r/1151063

Change #1150732 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Arlolra):

[mediawiki/extensions/Cite@master] Revert "Remove more "embedded content" logic that's not needed"

https://gerrit.wikimedia.org/r/1150732

thiemowmde moved this task from Unsorted backlog to Doing on the Cite board.

Sorry for the mess 🙈️ and thanks a lot for digging into the problem. Personally, I believe the only sustainable option (besides talking to each other 😅️) is to rely on test cases when we work on this code together, across teams and continents. Unfortunately it looks like neither of the two edge-cases you discovered was covered.

The revert https://gerrit.wikimedia.org/r/1150732 is sufficient and a very valid way forward, as far as I can see. Thanks a lot for constructing this patch.

However, that doesn't solve the original problem that we apparently have critical, untested code paths. I uploaded two alternative patches, https://gerrit.wikimedia.org/r/1151063 and https://gerrit.wikimedia.org/r/1151065. Both patches add new test cases that demonstrate the different behavior. I'm relatively sure these two patches cover everything and the general revert is not needed any more.

Change #1150732 abandoned by Arlolra:

[mediawiki/extensions/Cite@master] Revert "Remove more "embedded content" logic that's not needed"

Reason:

In favour of Idd3f394fef644c47575ccee449fe3d6c225ae660 and Ia465126711adfb9f798c3f1a0e4b890b597bfc3f

https://gerrit.wikimedia.org/r/1150732

Sorry for the mess 🙈️ and thanks a lot for digging into the problem.

Don't sweat it, it happens

Personally, I believe the only sustainable option (besides talking to each other 😅️) is to rely on test cases when we work on this code together, across teams and continents. Unfortunately it looks like neither of the two edge-cases you discovered was covered.

I believe that's the Beyonce rule, and while I agree, there's also Chesterton's fence to consider. Clearly Parsoid's Cite implementation doesn't have thorough coverage so it might be helpful to solicit review from the CTT when refactoring. Tread lightly.

The revert https://gerrit.wikimedia.org/r/1150732 is sufficient and a very valid way forward, as far as I can see. Thanks a lot for constructing this patch.

However, that doesn't solve the original problem that we apparently have critical, untested code paths. I uploaded two alternative patches, https://gerrit.wikimedia.org/r/1151063 and https://gerrit.wikimedia.org/r/1151065. Both patches add new test cases that demonstrate the different behavior. I'm relatively sure these two patches cover everything and the general revert is not needed any more.

I think you should also revert adding $this->inReferenceList() to

public function inEmbeddedContent(): bool {
		return $this->inReferenceList() || $this->embeddedContentStack;
	}

While it's true, it's conflating the two different bookkeeping systems and is unnecessary.

I've noted that in Id64424c446b50e92c28cd0f9e3cdb2de350bbd90, you've removed a pretty import return statement.

That effectively reintroduced the crasher from Icedd6fdf718183ad5bb873a9f92421f61aa49217 / T382546.

The test case from the commit message reproduces it,

{{1x|
<ref name="ref1"/>
<ref name="ref1">some content
* <ref name="ref1">ref1</ref>
</ref>
}}

Change #1151065 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Partly revert and fix untested code path for nested <ref>

https://gerrit.wikimedia.org/r/1151065

Change #1151063 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Revert and add test for "in indicator context" logic

https://gerrit.wikimedia.org/r/1151063

Change #1153166 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Add missing test case for crash issue in T382546

https://gerrit.wikimedia.org/r/1153166

@ABreault-WMF

https://gerrit.wikimedia.org/r/1131771 […] removed a […] return statement. That effectively reintroduced the crasher from […] T382546.

I remember being surprised by a return in middle of a 370+ lines long function. A surprisingly large chunk of unrelated pieces of logic gets skipped, e.g. code that processes dir and follow attributes as well as conflicting and missing content. Since removing the return didn't do anything (no test failure, no test case we could think of doing anything different) we figured it must be dead code.

I tried to understand better what happened in T382546. I think the return was not the best solution but unfortunately couldn't come up with a better approach. For now I suggest to just add the return back, see https://gerrit.wikimedia.org/r/1153166.

Change #1163911 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Cite@master] [WIP] Self-referential ref

https://gerrit.wikimedia.org/r/1163911

@ABreault-WMF

https://gerrit.wikimedia.org/r/1131771 […] removed a […] return statement. That effectively reintroduced the crasher from […] T382546.

I remember being surprised by a return in middle of a 370+ lines long function. A surprisingly large chunk of unrelated pieces of logic gets skipped, e.g. code that processes dir and follow attributes as well as conflicting and missing content. Since removing the return didn't do anything (no test failure, no test case we could think of doing anything different) we figured it must be dead code.

In a situation like that though, it might have been helpful to use git blame to identify how the return originated. The commit message had a useful test case.

I tried to understand better what happened in T382546. I think the return was not the best solution but unfortunately couldn't come up with a better approach. For now I suggest to just add the return back, see https://gerrit.wikimedia.org/r/1153166.

I put up a patch in T393913#10949148 which will hopefully take care of it

Change #1166951 had a related patch set uploaded (by Arlolra; author: Arlolra):

[mediawiki/extensions/Cite@master] Self-referential ref

https://gerrit.wikimedia.org/r/1166951

I put up a patch in T393913#10949148 which will hopefully take care of it

The patch in T393913#10981766 is a better one

Change #1166951 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Ensure self-referential refs with redefinitions emit errors, not crash

https://gerrit.wikimedia.org/r/1166951

Change #1153166 abandoned by Thiemo Kreuz (WMDE):

[mediawiki/extensions/Cite@master] Add missing test case for crash issue in T382546

https://gerrit.wikimedia.org/r/1153166

Change #1163911 abandoned by Arlolra:

[mediawiki/extensions/Cite@master] [WIP] Self-referential ref

Reason:

In favour of Id4bc98544fcd8fb2ed80b4389f228b3b9cb2b48d

https://gerrit.wikimedia.org/r/1163911

Change #1269982 had a related patch set uploaded (by Thiemo Kreuz (WMDE); author: Thiemo Kreuz (WMDE)):

[mediawiki/extensions/Cite@master] Add missing <references> tag to a parser test

https://gerrit.wikimedia.org/r/1269982

Change #1269982 merged by jenkins-bot:

[mediawiki/extensions/Cite@master] Add missing <references> tag to a parser test

https://gerrit.wikimedia.org/r/1269982

I think Don't mutate the DocumentFragment when comparing its HTML (1298350) · Gerrit Code Review might be related to this issue: References::checkForConflictingContent calls ParsoidExtensionAPI::domToHtml which (regrettably) has side-effects -- including removing the data-object-id when it converts to inline attributes which then causes ParsoidExtensionAPI::getContentId() to return an empty string and ParsoidExtensionAPI::getContentDOM() to fail.

Dumping the DOM in inline-attribute form instead of converting it to page bundle format (as the 1298350 patch does) ought to prevent this issue.

Change #1298350 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/extensions/Cite@master] Don't mutate the DocumentFragment when comparing its HTML

https://gerrit.wikimedia.org/r/1298350