Page MenuHomePhabricator

Mariadb database fails on Chinese characters
Open, HighPublic

Description

Description: At step 4, if the list of items contains Chinese characters, creating a share link fails.

Replicate :

Suspects :

  • Mariadb tables and/or columns requires unicode support ?
  • /upload_batches/models.py

#TODO, UTF: Make sure it is compatible with UTF-encoded texts

  1. UTF: somehow MariaDB on Toolforge will not handle unicode properly without the following two lines
  2. Ref: https://wikitech.wikimedia.org/wiki/Help:Toolforge/My_first_Django_OAuth_tool#:~:text=connection%20as%20follows%3A-,DATABASES,-%3D%20%7B%0A%20%20%20%20%27default

Ask ChatGPT :

Hello Mentor,
My mariadb has a field which must store characters.
When my inputs only includes ASCII characters, all works.
When my input includes Chinese characters, it fails.
What should I do ?
Keep your answer concise.

Event Timeline

Yug triaged this task as High priority.Jan 27 2025, 4:11 PM

I'm not a Toolforge expert, however this page suggests in a roundabout way that the default database might be utf8_general_ci, which is not utf8mb4_general_ci, i.e. it only stores the three byte characters necessary to encode the Basic Multilingual Plane and not e.g. some Chinese characters or emoji. It might even be code-page based such as the old MySQL default of latin1_swedish_ci which would not store characters other than those which fit into a single byte in the selected code page, precluding all Chinese characters.