Page MenuHomePhabricator

How to correctly read multibyte characters from text fields on DB replicas on Toolforge?
Closed, ResolvedPublic

Description

When my bot, running on Toolforge, reads data from DB replicas, any characters outside of ASCII-7 are replaced by "?" characters, one character for one byte of multibyte character. To solve this problem, I use casting
select cast(fieldname as char) alias
It works, but not for symbols like [1] or [2]. Looks like, this way doesn't work with 4+byte characters, or characters outside of Basic Multilingual Plane. How this problem can be solved?

[1]: https://ru.wikipedia.org/wiki/special:log?type=create&user=Иван+Булкин
[2]: https://ru.wikipedia.org/w/index.php?diff=103627950

Event Timeline

taavi subscribed.

You might explicitly need to tell your tool to use utf8mb4 character set when creating the MySQL connection to the replicas.

MBH claimed this task.

Looks like it works, thanks.