I'm now even more convinced that the problem is with the code that replaces 0x85 (incorrectly treated as NEL) with 0x0D+0x0A (CR+LF).
Because xD1 x0D (or xD1 x0A), xD3 x0D (or xD3 x0A), etc. are malformed UTF-8 sequences indeed.
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Feed Advanced Search
Advanced Search
Advanced Search
Sep 8 2017
Sep 8 2017
Mar 24 2017
Mar 24 2017
• Drbug added a comment to T161263: Wikidata does not accept characters ending in \x85 (Cyrillic х, Armenian Յ, Arabic م etc.) in labels/aliases/descriptions.
• Drbug added a comment to T161263: Wikidata does not accept characters ending in \x85 (Cyrillic х, Armenian Յ, Arabic م etc.) in labels/aliases/descriptions.
May it be related to the fact that Unicode NEL character (Next Line) is U+0085?
Hence, it should be 0xC2 0x85 in UTF-8, but some code that checks for new lines might check just against 0x85 instead by mistake.
Content licensed under Creative Commons Attribution-ShareAlike (CC BY-SA) 4.0 unless otherwise noted; code licensed under GNU General Public License (GPL) 2.0 or later and other open source licenses. By using this site, you agree to the Terms of Use, Privacy Policy, and Code of Conduct. · Wikimedia Foundation · Privacy Policy · Code of Conduct · Terms of Use · Disclaimer · CC-BY-SA · GPL