Page MenuHomePhabricator

PHP Notice: Trying to get property 'parsoid' of non-object
Closed, ResolvedPublicPRODUCTION ERROR

Description

Error

Looks like $out['pb'] is coming out null or something else. This causes other downstream errors

message
PHP Notice: Trying to get property 'parsoid' of non-object

Details

Request ID
XbkAqwpAMFAAALC@vWkAAABP
Request URL
/w/rest.php/ko.wikipedia.org/v3/page/pagebundle/%EC%9D%B4%EC%8A%B9%EA%B8%B0/25132807
Stack Trace
exception.trace
#0 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(130): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/ParsoidHandler.php(591): Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, array)
#2 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/PageHandler.php(47): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(Parsoid\Config\Env, array)
#3 /includes/Rest/Router.php(315): MWParsoid\Rest\Handler\PageHandler->execute()
#4 /includes/Rest/Router.php(285): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#5 /includes/Rest/EntryPoint.php(116): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#6 /includes/Rest/EntryPoint.php(83): MediaWiki\Rest\EntryPoint->execute()
#7 /rest.php(31): MediaWiki\Rest\EntryPoint::main()
#8 /srv/mediawiki/w/rest.php(3): require(string)
#9 {main}

Event Timeline

ssastry triaged this task as Medium priority.Oct 30 2019, 3:27 AM

If I add JSON_THROW_ON_ERROR to PHPUtils::jsonEncode, we get when trying to stringify the pagebundle,

src/Utils/PHPUtils.php: Malformed UTF-8 characters, possibly incorrectly encoded

An isolated test case is [[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]

That test case is interesting:

$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | php bin/parse.php  --domain ko.wikipedia.org --body_only
<p data-parsoid='{"dsr":[0,93,0,0]}'>[<a rel="mw:ExtLink" href="https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램" class="external text" data-parsoid="">아이콘.png</a>]</p>
$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | bin/parse.js  --domain ko.wikipedia.org --body_only
<p data-parsoid='{"dsr":[0,73,0,0]}'>[<a rel="mw:ExtLink" href="https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램" class="external text" data-parsoid='{"a":{"href":"https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램"},"sa":{"href":"https://www.instagram.com/asiansoul_jyp|[[파일:인스타"},"dsr":[1,59,50,1]}'>아이콘.png</a>]</p>

From legacy parser, with $wgLanguageCode='ko':

$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | php maintenance/parse.php 
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<p>[<a rel="nofollow" class="external text" href="https://www.instagram.com/asiansoul_jyp%7C"></a><a href="/~cananian/mediawiki/index.php?title=%ED%8A%B9%EC%88%98:%EC%98%AC%EB%A6%AC%EA%B8%B0&amp;wpDestFile=%EC%9D%B8%EC%8A%A4%ED%83%80%EA%B7%B8%EB%9E%A8_%EC%95%84%EC%9D%B4%EC%BD%98.png" class="new" title="파일:인스타그램 아이콘.png">width=24</a>]

So data-parsoid in Parsoid/PHP is being omitted, presumably due to a exception during JSON serialization which @Arlolra found -- but the root cause is because the data-parsoid sa property is being inappropriately truncated -- the a property seems to be correct.

Both of them differ from the legacy parser, but that might be because it appears that just setting $wgLanguageCode on my localhost is not enough to get it to recognize the localized namespace? Could be something else going on, too...

You should be able to use the TSR to get the appropriate region of the original wikitext and re-tokenize that, instead of trying to reconstruct it.

Change 551272 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Ko

https://gerrit.wikimedia.org/r/551272

Change 551272 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use frame source instead of stringifying tokens

https://gerrit.wikimedia.org/r/551272