Page MenuHomePhabricator

PHP Notice: Trying to get property 'parsoid' of non-object
Closed, ResolvedPublic

Description

Error

Looks like $out['pb'] is coming out null or something else. This causes other downstream errors

message
PHP Notice: Trying to get property 'parsoid' of non-object

Details

Request ID
XbkAqwpAMFAAALC@vWkAAABP
Request URL
/w/rest.php/ko.wikipedia.org/v3/page/pagebundle/%EC%9D%B4%EC%8A%B9%EA%B8%B0/25132807
Stack Trace
exception.trace
#0 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(130): MWExceptionHandler::handleError(integer, string, string, integer, array)
#1 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/ParsoidHandler.php(591): Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, array)
#2 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/PageHandler.php(47): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(Parsoid\Config\Env, array)
#3 /includes/Rest/Router.php(315): MWParsoid\Rest\Handler\PageHandler->execute()
#4 /includes/Rest/Router.php(285): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler)
#5 /includes/Rest/EntryPoint.php(116): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals)
#6 /includes/Rest/EntryPoint.php(83): MediaWiki\Rest\EntryPoint->execute()
#7 /rest.php(31): MediaWiki\Rest\EntryPoint::main()
#8 /srv/mediawiki/w/rest.php(3): require(string)
#9 {main}
Related Gerrit Patches:
mediawiki/services/parsoid : masterUse frame source instead of stringifying tokens

Event Timeline

ssastry created this task.Oct 30 2019, 3:26 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 30 2019, 3:26 AM
ssastry triaged this task as Medium priority.Oct 30 2019, 3:27 AM
Arlolra claimed this task.Fri, Nov 15, 7:10 PM

If I add JSON_THROW_ON_ERROR to PHPUtils::jsonEncode, we get when trying to stringify the pagebundle,

src/Utils/PHPUtils.php: Malformed UTF-8 characters, possibly incorrectly encoded

An isolated test case is [[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]

cscott added a subscriber: cscott.EditedFri, Nov 15, 8:40 PM

That test case is interesting:

$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | php bin/parse.php  --domain ko.wikipedia.org --body_only
<p data-parsoid='{"dsr":[0,93,0,0]}'>[<a rel="mw:ExtLink" href="https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램" class="external text" data-parsoid="">아이콘.png</a>]</p>
$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | bin/parse.js  --domain ko.wikipedia.org --body_only
<p data-parsoid='{"dsr":[0,73,0,0]}'>[<a rel="mw:ExtLink" href="https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램" class="external text" data-parsoid='{"a":{"href":"https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램"},"sa":{"href":"https://www.instagram.com/asiansoul_jyp|[[파일:인스타"},"dsr":[1,59,50,1]}'>아이콘.png</a>]</p>

From legacy parser, with $wgLanguageCode='ko':

$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | php maintenance/parse.php 
parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse.

<p>[<a rel="nofollow" class="external text" href="https://www.instagram.com/asiansoul_jyp%7C"></a><a href="/~cananian/mediawiki/index.php?title=%ED%8A%B9%EC%88%98:%EC%98%AC%EB%A6%AC%EA%B8%B0&amp;wpDestFile=%EC%9D%B8%EC%8A%A4%ED%83%80%EA%B7%B8%EB%9E%A8_%EC%95%84%EC%9D%B4%EC%BD%98.png" class="new" title="파일:인스타그램 아이콘.png">width=24</a>]

So data-parsoid in Parsoid/PHP is being omitted, presumably due to a exception during JSON serialization which @Arlolra found -- but the root cause is because the data-parsoid sa property is being inappropriately truncated -- the a property seems to be correct.

Both of them differ from the legacy parser, but that might be because it appears that just setting $wgLanguageCode on my localhost is not enough to get it to recognize the localized namespace? Could be something else going on, too...

// NOTE: Tokenizing this as src seems little suspect

From https://github.com/wikimedia/parsoid/blob/master/src/Wt2Html/TT/WikiLinkHandler.php#L330-L344

And, indeed, it is.

You should be able to use the TSR to get the appropriate region of the original wikitext and re-tokenize that, instead of trying to reconstruct it.

Change 551272 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Ko

https://gerrit.wikimedia.org/r/551272

Change 551272 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use frame source instead of stringifying tokens

https://gerrit.wikimedia.org/r/551272

Arlolra closed this task as Resolved.Fri, Nov 15, 11:08 PM