Eslint flagged an issue with the Wiktionary definition JSON output for the word 'cat', specifically with LTR marks[1], when developing the diff tests[2]:
test/diff/results/page_definition-enwiktionary-cat.json 112:110 error This character may get silently deleted by one or more browsers bad-json 133:81 error This character may get silently deleted by one or more browsers bad-json
@Niedzielski mentioned for PS3 of [2]: Consider encoding the responses as \u200e instead of embedding a raw LTR mark.
These control characters come from Parsoid.
https://en.wiktionary.org/api/rest_v1/page/html/cat
<span typeof="mw:Entity" about="#mwt946">‎</span>
The best view I have received so far is from Chrome DevTools. Another way to view is was in Vim as <200E>. I was not able to copy&paste this into Phab without it disappearing, I had to re-type it, as most modern/HTML based editors interpret this control character.
I'm not sure why they are in otherwise empty span tags and what the effect is. The location of these is a bit mysterious to me since I see no RTL characters nearby (there are some, but in distant parts of the same document).
Here's the corresponding MCS endpoint with a bit more context: https://en.wiktionary.org/api/rest_v1/page/definition/cat
"examples": [ "<i><span class=\"Latn\">a carrier's bow <b>cats</b></span></i><span>‎</span>" ]
Should we add special handling to all span[typeof=mw:Entity] in MCS when building the JSON?
We should test the Android app to make sure that there are no negative side-effects if we chose to change this behavior.
As part of this task, the .eslintignore entry for **/results should be removed.
[1] https://en.wikipedia.org/wiki/Left-to-right_mark
[2] https://gerrit.wikimedia.org/r/#/c/335092/3