Page MenuHomePhabricator

the bounce_records table in x1 wikishared has latin1 encoding
Closed, ResolvedPublic

Description

We discovered while creating tables on the wikishared database of the x1 cluster that the default encoding was set to latin1. Probably as a consequence, the tables has the latin1 enconding:

MariaDB [wikishared]> SHOW CREATE TABLE bounce_records\G
*************************** 1. row ***************************
       Table: bounce_records
Create Table: CREATE TABLE `bounce_records` (
  `br_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `br_user_email` varchar(255) NOT NULL,
  `br_timestamp` varbinary(14) NOT NULL,
  `br_reason` varchar(255) NOT NULL,
  PRIMARY KEY (`br_id`),
  KEY `br_mail_timestamp` (`br_user_email`(50),`br_timestamp`),
  KEY `br_timestamp` (`br_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=230015 DEFAULT CHARSET=latin1

This means that the table will not be able to hold utf8-only characters, or worse, fail silently. Wiki standar for WMF tables is to set them in binary collation, and let the software handle the encoding.

We changed the configuration of wikishared database to be binary, but we didn't touch the table: T111333

Let me know what to do with it, convert it? leave it as is?

Event Timeline

jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo subscribed.

I dont think it will affect the tables, but let me add in people who where in the development to figure this one out

It does affect the table, because it has two varchar fields (br_reason and br_user_email), plus it could affect extra fields added. It does not affect the br_timestamp field.

Lets convert it to binary. It definitely wasn't intentional that it was created using latin1.

From my quick read of Sanitizer::validateEmail(), we should not have any unicode characters in the br_user_email field. It might be possible for br_reason to have them.

Lets convert it to binary. It definitely wasn't intentional that it was created using latin1.

@jcrespo Can you add this to your TODO list to convert this table please?

Presumably:

ALTER TABLE bounce_records CONVERT TO CHARACTER SET binary;
jcrespo triaged this task as Medium priority.
jcrespo moved this task from Blocked external/Not db team to Backlog on the DBA board.
jcrespo set Security to None.

@Ready I hope you understand that it is not as easy as executing 1 command.

I've performed the schema change:

MariaDB [wikishared]> SHOW CREATE TABLE bounce_records\G
*************************** 1. row ***************************
       Table: bounce_records
Create Table: CREATE TABLE `bounce_records` (
  `br_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `br_user_email` varbinary(255) NOT NULL,
  `br_timestamp` varbinary(14) NOT NULL,
  `br_reason` varbinary(255) NOT NULL,
  PRIMARY KEY (`br_id`),
  KEY `br_mail_timestamp` (`br_user_email`(50),`br_timestamp`),
  KEY `br_timestamp` (`br_timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=236735 DEFAULT CHARSET=binary
1 row in set (0.00 sec)

Please check the new definition of wikishared.bounce_records, update your code and validate that it is still compatible.

I made some heuristics so that no previous records would be affected:

MariaDB [wikishared]> SELECT max(length(br_user_email)) FROM bounce_records;
+----------------------------+
| max(length(br_user_email)) |
+----------------------------+
|                         62 |
+----------------------------+
1 row in set (0.04 sec)

MariaDB [wikishared]> SELECT max(char_length(br_user_email)) FROM bounce_records;
+---------------------------------+
| max(char_length(br_user_email)) |
+---------------------------------+
|                              62 |
+---------------------------------+
1 row in set (0.01 sec)

MariaDB [wikishared]> SELECT max(length(br_reason)) FROM bounce_records;
+------------------------+
| max(length(br_reason)) |
+------------------------+
|                     84 |
+------------------------+
1 row in set (0.01 sec)

MariaDB [wikishared]> SELECT max(char_length(br_reason)) FROM bounce_records;
+-----------------------------+
| max(char_length(br_reason)) |
+-----------------------------+
|                          84 |
+-----------------------------+
1 row in set (0.00 sec)

But the application is the one resposible for future records. Close when ok with it.

Resolved as per lack of feedback after work was done.