Page MenuHomePhabricator

tokensToString being called on KV->V looks like a signature mismatch
Closed, ResolvedPublic

Description

In kvToHash, we have self::tokensToString( $kv->v ) but

KV->V

/** @var string|Token|Token[]|KV[] */

vs

tokensToString

* @param string|Token|array<Token|string> $tokens

so any KVs found there are probably dropped since there's no condition for them in the loop.

Event Timeline

Arlolra created this task.Jul 16 2019, 10:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 16 2019, 10:44 PM

It looks like the KV[] scenario is from https://github.com/wikimedia/parsoid/blob/a66aad7d6697447300a08ca24c0c246f9ac84068/src/Wt2Html/Grammar.pegphp#L274 (same on the JS side). The reason why we might not have tripped over this in tokensToString is probably because ext-tokens never hit this code path. But yes, we should probably add asserts to both JS & PHP side to ensure we catch unhandled scenarios early.

In https://github.com/wikimedia/parsoid/blob/a66aad7d6697447300a08ca24c0c246f9ac84068/src/Wt2Html/Grammar.pegphp#L1409, we pass null for $v in some scenarios.
https://github.com/wikimedia/parsoid/blob/a66aad7d6697447300a08ca24c0c246f9ac84068/src/Utils/PipelineUtils.php#L58, we pass a boolean.
Surprisingly, I didn't hit the numeric value that I was expecting to see. But, the Cite code is still reporting a number. So, it is coming from somewhere else. To be continued.

cscott added a subscriber: cscott.Jul 17 2019, 3:45 AM

null should probably be converted to '' in that example.

The booleans should either be explicitly stringified (ie, $inPHPBlock ? '1' : '0') or they should be converted into presence/absence of the attribute (ie, $attributes + ($inPHPBlock ? [ new KV('inPHPBlock', '') ] : [])).

I don't think the absence of KV[] in tokensToString is an issue -- it's not clear how those are supposed to be stringified, KV[] is only used to encapsulate extension options IIRC -- but I noticed that tokensToString takes array<string> which isn't a valid value for KV's value. As far as I know, an array of strings should be joined into a single string. I think these were used in some of the list item tokens, though, where each bullet character was a separate element in the array. That should be fixed to join the strings. (I thought that had actually already been done on the PHP side, although not on the JS side.)

Change 524035 had a related patch set uploaded (by Arlolra; owner: Arlolra):
[mediawiki/services/parsoid@master] Assert KVs aren't found when calling TokenUtils::tokensToString()

https://gerrit.wikimedia.org/r/524035

Change 524035 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Assert KVs aren't found when calling TokenUtils::tokensToString()

https://gerrit.wikimedia.org/r/524035

Arlolra closed this task as Resolved.Jul 17 2019, 10:25 PM
Arlolra claimed this task.

Mentioned in SAL (#wikimedia-operations) [2019-08-05T20:34:08Z] <arlolra> Updated Parsoid to 7232dff (T228223)