Page MenuHomePhabricator

How to correctly read multibyte characters from text fields on DB replicas on Toolforge?
Open, Needs TriagePublic

Description

When my bot, running on Toolforge, reads data from DB replicas, any characters outside of ASCII-7 are replaced by "?" characters, one character for one byte of multibyte character. To solve this problem, I use casting
select cast(fieldname as char) alias
It works, but not for symbols like [1] or [2]. Looks like, this way doesn't work with 4+byte characters, or characters outside of Basic Multilingual Plane. How this problem can be solved?

[1]: https://ru.wikipedia.org/wiki/special:log?type=create&user=Иван+Булкин
[2]: https://ru.wikipedia.org/w/index.php?diff=103627950

Event Timeline

MBH created this task.Jul 4 2020, 10:03 AM
Restricted Application added subscribers: Base, Aklapper. · View Herald TranscriptJul 4 2020, 10:03 AM
Marostegui moved this task from Triage to Backlog on the DBA board.Jul 6 2020, 9:15 AM