Error
Looks like $out['pb'] is coming out null or something else. This causes other downstream errors
PHP Notice: Trying to get property 'parsoid' of non-object
Looks like $out['pb'] is coming out null or something else. This causes other downstream errors
PHP Notice: Trying to get property 'parsoid' of non-object
#0 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/src/Parsoid.php(130): MWExceptionHandler::handleError(integer, string, string, integer, array) #1 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/ParsoidHandler.php(591): Parsoid\Parsoid->wikitext2html(MWParsoid\Config\PageConfig, array, array) #2 /srv/deployment/parsoid/deploy-cache/revs/aa59ce3d0aa035504666a63c99667398d0ea1928/src/extension/src/Rest/Handler/PageHandler.php(47): MWParsoid\Rest\Handler\ParsoidHandler->wt2html(Parsoid\Config\Env, array) #3 /includes/Rest/Router.php(315): MWParsoid\Rest\Handler\PageHandler->execute() #4 /includes/Rest/Router.php(285): MediaWiki\Rest\Router->executeHandler(MWParsoid\Rest\Handler\PageHandler) #5 /includes/Rest/EntryPoint.php(116): MediaWiki\Rest\Router->execute(MediaWiki\Rest\RequestFromGlobals) #6 /includes/Rest/EntryPoint.php(83): MediaWiki\Rest\EntryPoint->execute() #7 /rest.php(31): MediaWiki\Rest\EntryPoint::main() #8 /srv/mediawiki/w/rest.php(3): require(string) #9 {main}
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Use frame source instead of stringifying tokens | mediawiki/services/parsoid | master | +11 -13 |
If I add JSON_THROW_ON_ERROR to PHPUtils::jsonEncode, we get when trying to stringify the pagebundle,
src/Utils/PHPUtils.php: Malformed UTF-8 characters, possibly incorrectly encoded
An isolated test case is [[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]
That test case is interesting:
$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | php bin/parse.php --domain ko.wikipedia.org --body_only <p data-parsoid='{"dsr":[0,93,0,0]}'>[<a rel="mw:ExtLink" href="https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램" class="external text" data-parsoid="">아이콘.png</a>]</p> $ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | bin/parse.js --domain ko.wikipedia.org --body_only <p data-parsoid='{"dsr":[0,73,0,0]}'>[<a rel="mw:ExtLink" href="https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램" class="external text" data-parsoid='{"a":{"href":"https://www.instagram.com/asiansoul_jyp%7C파일:인스타그램"},"sa":{"href":"https://www.instagram.com/asiansoul_jyp|[[파일:인스타"},"dsr":[1,59,50,1]}'>아이콘.png</a>]</p>
From legacy parser, with $wgLanguageCode='ko':
$ echo '[[https://www.instagram.com/asiansoul_jyp|[[파일:인스타그램 아이콘.png|width=24]]]]' | php maintenance/parse.php parse.php: warning: reading wikitext from STDIN. Press CTRL+D to parse. <p>[<a rel="nofollow" class="external text" href="https://www.instagram.com/asiansoul_jyp%7C"></a><a href="/~cananian/mediawiki/index.php?title=%ED%8A%B9%EC%88%98:%EC%98%AC%EB%A6%AC%EA%B8%B0&wpDestFile=%EC%9D%B8%EC%8A%A4%ED%83%80%EA%B7%B8%EB%9E%A8_%EC%95%84%EC%9D%B4%EC%BD%98.png" class="new" title="파일:인스타그램 아이콘.png">width=24</a>]
So data-parsoid in Parsoid/PHP is being omitted, presumably due to a exception during JSON serialization which @Arlolra found -- but the root cause is because the data-parsoid sa property is being inappropriately truncated -- the a property seems to be correct.
Both of them differ from the legacy parser, but that might be because it appears that just setting $wgLanguageCode on my localhost is not enough to get it to recognize the localized namespace? Could be something else going on, too...
// NOTE: Tokenizing this as src seems little suspect
From https://github.com/wikimedia/parsoid/blob/master/src/Wt2Html/TT/WikiLinkHandler.php#L330-L344
And, indeed, it is.
You should be able to use the TSR to get the appropriate region of the original wikitext and re-tokenize that, instead of trying to reconstruct it.
Change 551272 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] [WIP] Ko
Change 551272 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Use frame source instead of stringifying tokens