Page MenuHomePhabricator

VisualEditor: [regression] misalignment of multibytes parameters names on templates with TemplateData custom-format
Closed, ResolvedPublicBUG REPORT

Description

Hello,

There is a problem with the visual editor when you add or modify a template with a custom format. Some parameters equal sign are no longer aligned.

It seems that visual editor when calculating the length of the names of the different parameters, it is the byte size that is used instead of the size number of characters.

See example on frwiki. The number of spaces after certain parameters is mistake reduced.

It seems to me that this bug did not exist in July 2019 (I had not contributed between July 2019 and February 2020), so it's probably a regression.

The problem appears to be from parameters names with multibytes characters.

Test :

The custom format used for this test was as follows: {{_\n | ___________ = _\n}}\n (diff)

The following test with the visual editor gives this: (see test diff)

{{Bac à sable
 | abc         = 123
 | abc abc     = 123
 | abc abc abc = 123
 | abc ébc    = 123
 | abc ébé   = 123
 | abc €bc   = 123
 | abc €b€ = 123
}}

Instead of:

{{Bac à sable
 | abc         = 123
 | abc abc     = 123
 | abc abc abc = 123
 | abc ébc     = 123
 | abc ébé     = 123
 | abc €bc     = 123
 | abc €b€     = 123
}}

In the example, the character é (two bytes) seems to count as two characters. And the character (three bytes) count as three characters.

CharacterBytes
a, b, c1
é2
3

The problem is not present with TemplateWizard (diff).

Thank you.

Sorry for my bad English.

Event Timeline

ssastry subscribed.

Ah, this might actually be a Parsoid/JS -> Parsoid/PHP regression caused by use of strlen instead of mb_strlen.

ssastry changed the subtype of this task from "Task" to "Bug Report".Feb 19 2020, 10:43 PM
ssastry edited projects, added Parsoid-PHP; removed Parsoid, VisualEditor, TemplateData.
ssastry moved this task from Backlog to Bugs, Notices, Crashers on the Parsoid-PHP board.

The issue is probably in WikitextSerializer::formatStringSubst().

Looking at the code, it seems like the same problem would affect Parsoid/JS, but only for characters that take more than 1 byte in UTF-16 encoding, since that's what JavaScript uses internally, e.g. '💩'. That code should have used something like codePointLength() (https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+/master/resources/src/mediawiki.String.js#45) to measure
the strings. I'm not sure if you care about Parsoid/JS bugs any more though :)

Change 575625 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] Templatedata Formatting: Handle multibyte unicode chars correctly

https://gerrit.wikimedia.org/r/575625

Change 575625 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Templatedata Formatting: Handle multibyte unicode chars correctly

https://gerrit.wikimedia.org/r/575625

This is fixed but not yet deployed.