Page MenuHomePhabricator

Encode 4-byte unicode properly
Closed, ResolvedPublic

Description

While working on T207645 I discovered that 4-byte unicode characters such as 𠝹 aren't being saved to the database correctly. They're saved as ?.

The current setup, as far as I can tell, has the server, table, and columns all correctly set to the utf8mb4 character set, which should support this, and my connection through db.py is using the same encoding. I checked the output of db.py just before saving to the DB and it had the correct characters, so something is going wrong from the point it's saved to the DB.