Page MenuHomePhabricator

Unicode Character U+1037 Shifts One Character Forward When Saved
Closed, ResolvedPublic


Author: ravi.chhabra

Image showing parser error when saved and rendered.

When we write articles and save them Wikipedia does not seem to save it as exactly written but move the character U+1037 forward by exactly one character. This is reproducible. An Image is also attached showing the problem.

Reproducible: Yes
Steps to Reproduce:
Enter this Unicode Data in an new article and save: U+1014 U+103E U+1004 U+103A U+1037
Actual Result: U+1014 U+103E U+1004 U+1037 U+103A
Expected Result: U+1014 U+103E U+1004 U+103A U+1037

The data should have been the same. This happens on all occurances of U+1037 and is tending to be a big problem in Myanmar Wikipedia. Some fonts that try to show incorrect sequence of encoding do so by using a dotted model, hence Wikipedia pages look ugly with it. It needs to be fixed before Myanmar Wikipedia adoption picks up. I have attached an image showing the problem.

Version: unspecified
Severity: major


1037.jpg (385×464 px, 22 KB)



Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:09 PM
bzimport set Reference to bz14834.
bzimport added a subscriber: Unknown Object (MLST).

That is expected, because we do Unicode normalisation for all input. See bug 2399 for more technical details.

  • This bug has been marked as a duplicate of bug 2399 ***