Page MenuHomePhabricator

Call to a member function getContent() on null
Closed, ResolvedPublic

Description

  • /w/rest.php/he.wikipedia.org/v3/page/wikitext/%D7%90%D7%9C%D7%99%20%D7%9B%D7%94%D7%9F%20(%D7%9B%D7%95%D7%9C%D7%A0%D7%95)/
  • /w/rest.php/it.wikipedia.org/v3/page/wikitext/Acacio_(disambigua)/
  • /w/rest.php/de.wikipedia.org/v3/page/wikitext/Manso%20(Korsika)/

Exception trace:

#0 /srv/deployment/parsoid/deploy/src/src/Config/Env.php(231): Parsoid\Wt2Html\PageConfigFrame->__construct(Parsoid\Config\Env, MWParsoid\Config\PageConfig, MWParsoid\Config\SiteConfig)
#1 /srv/deployment/parsoid/deploy/src/extension/src/Rest/Handler/ParsoidHandler.php(235): Parsoid\Config\Env->__construct(MWParsoid\Config\SiteConfig, MWParsoid\Config\PageConfig, MWParsoid\Config\DataAccess, array)
#2 /srv/deployment/parsoid/deploy/src/extension/src/Rest/Handler/PageHandler.php(33): MWParsoid\Rest\Handler\ParsoidHandler->createEnv(string, integer)
#3 /srv/mediawiki/php-1.34.0-wmf.23/includes/Rest/Router.php(307): MWParsoid\Rest\Handler\PageHandler->execute()
#4 /srv/mediawiki/php-1.34.0-wmf.23/includes/Rest/Router.php(286): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#5 /srv/mediawiki/php-1.34.0-wmf.23/includes/Rest/EntryPoint.php(112): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#6 /srv/mediawiki/php-1.34.0-wmf.23/includes/Rest/EntryPoint.php(79): MediaWiki\Rest\EntryPoint->execute()
#7 /srv/mediawiki/php-1.34.0-wmf.23/rest.php(31): MediaWiki\Rest\EntryPoint::main()
#8 /srv/mediawiki/w/rest.php(3): require(string)
#9 {main}

Event Timeline

I think I just fixed this in I62868e8dc9eabea4da68a556e692f5f135924a8a -- the issue is that conversions on new pages crash creating the Env because the $revisionRecord is null (since the page didn't previously exist). I'm guessing that these are pages which are in our rt-testing set but have been deleted from the production db.

ssastry triaged this task as Medium priority.Sep 25 2019, 2:06 PM
ssastry moved this task from Backlog to Bugs, Notices, Crashers on the Parsoid-PHP board.

I think I just fixed this in I62868e8dc9eabea4da68a556e692f5f135924a8a -- the issue is that conversions on new pages crash creating the Env because the $revisionRecord is null (since the page didn't previously exist). I'm guessing that these are pages which are in our rt-testing set but have been deleted from the production db.

This might very well be the case. Parsoid/JS rt testing is reporting Did not find page revisions for Manso_(Korsika) etc. So, this is likely to go away once that patch is merged and deployed. Assigning it to you since you already fixed it. :)

Change 539428 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Ensure we have page content for the Api configs

https://gerrit.wikimedia.org/r/539428

I think I just fixed this in I62868e8dc9eabea4da68a556e692f5f135924a8a

Yeah, if you're using those set of configs.

The commit message for the above patch says,

The larger question is what operations make sense without page content.
We should be able to serialize given HTML without it but that's
currently blocked by the Env's construction of a PageConfigFrame.

Well, we have to be able to create a page from scratch (with no previous revision). So a zero-length string seems reasonable to me as a fallback, in both the Api config and the integrated config.

I agree this could potentially mask some errors, but any "real" erroneous case here will later trigger DSR consistency checks since all its offsets will be larger than the src text size. That sort of thing seems more reliable than checks inside the PageConfigFrame constructor. Generally the "real" error case would be something where we're given a page title or revision id which "should" exist, but turns out not to when we do the API call. This could even be a race condition, where the page or revision was deleted out from under us just before we checked for it. What's the right thing to do here? Probably just carry on (that's the "use an empty page source string") -- presumably we will catch this later when we try to actually save the content and it fails.

I think the "proper" solution would be to carry around some sort of parameter -- maybe even all the way from the REST API in terms of a special path or query parameter -- to allow us to distinguish the case where a revision not being found is "expected" (ie, new page creation), from where revision not being found should be "tolerated" (ie, it's always possible there's a race between saving a new revision and someone deleting the page -- but ideally in this case we'll have the old revision ourself and pass it in to the API, so that we can generate proper wikitext w/o having to fetch the original revision), from where revision not being found/supplied is definitely an bug that should generate a 500 error.

Well, we have to be able to create a page from scratch (with no previous revision). So a zero-length string seems reasonable to me as a fallback, in both the Api config and the integrated config.

Why do we need page content at all in that case though?

The reason this is throwing is because creating an environment currently requires having PageConfigFrame, which wants content. But, the frame concepts only seems necessary in the parsing context and isn't used when serializing (ie. creating a page).

Well, we have to be able to create a page from scratch (with no previous revision). So a zero-length string seems reasonable to me as a fallback, in both the Api config and the integrated config.

Why do we need page content at all in that case though?

The reason this is throwing is because creating an environment currently requires having PageConfigFrame, which wants content. But, the frame concepts only seems necessary in the parsing context and isn't used when serializing (ie. creating a page).

That was my exact question as well .. i.e. new pages being created will never be involved in a wt->html .. and hence getPageContent() should not be part of the picture. Presumably selser will not call getContent() in fallback paths ... but if so, that is one place where this edge case will pop up.

Presumably selser will not call getContent() in fallback paths ... but if so, that is one place where this edge case will pop up.

See https://github.com/wikimedia/parsoid/commit/79fca27ab25c48b3ba103416313a9d0ad21e6641 where we ensure SelserData->oldText is the only thing it uses.

Change 539991 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Fix regression in round trip testing script.

https://gerrit.wikimedia.org/r/539991

Change 539991 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Fix regression in round trip testing script.

https://gerrit.wikimedia.org/r/539991

Change 539428 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Ensure we have page content for the Api configs

https://gerrit.wikimedia.org/r/539428

Change 547013 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Return a clean 404 error if a non-existant title is requested via REST

https://gerrit.wikimedia.org/r/547013

Change 547013 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Return a clean 404 error if a non-existant title is requested via REST

https://gerrit.wikimedia.org/r/547013