Page MenuHomePhabricator

Gerrit: Convert gerrit's db caractor encoding from utf8 to utf8mb4 to prevent truncation of astral characters
Closed, DuplicatePublic

Description

Also convert collation to utf8mb4_unicode_ci

  • Reason I put this under security is when I was looking into T145885 I came across a warning that using utf8 (not full Unicode could present security issues)

https://mathiasbynens.be/notes/mysql-utf8mb4

"The content got truncated at the first astral Unicode symbol, in this case 𝌆 — so, attempting to insert foo𝌆bar actually inserted foo instead, resulting in data loss (and possibly introducing security issues; see below). MySQL returned a warning message, too:"

using utf8 can trunicate results.

we should switch to utf8mb4

Anyways utf8mb4 is recommended for MySQL 5.5.3+ which I presume gerrit is running a more recent version.

http://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci

Event Timeline

Paladox created this task.Dec 21 2016, 8:48 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 21 2016, 8:48 PM
Paladox triaged this task as High priority.Dec 21 2016, 8:49 PM
Paladox added a project: Operations.
Paladox added subscribers: demon, hashar, Dzahn.

As this is a security issue and affects gerrit setting to high priority.

Paladox updated the task description. (Show Details)Dec 21 2016, 8:51 PM
Paladox added a subscriber: Reedy.Dec 21 2016, 8:55 PM
Paladox raised the priority of this task from High to Unbreak Now!.Dec 21 2016, 9:04 PM

As it also says can lead to loss of data, setting to unbreak.

Under section MySQL’s utf8

and under the image you will get the security advisory.

of https://mathiasbynens.be/notes/mysql-utf8mb4

Uploaded the patch here https://gerrit.wikimedia.org/r/#/c/328571/ which requires DBA to first switch gerrit's db over to the new format then needs https://gerrit.wikimedia.org/r/#/c/328571/ merged straight after otherwise things will fail.

Bawolff lowered the priority of this task from Unbreak Now! to Low.Dec 21 2016, 9:21 PM
Bawolff added a subscriber: Bawolff.

Are you aware of any circumstances where data truncation in gerrit would cause a security issue? If not, I don't think this should be in Security.

Dzahn added a comment.EditedDec 21 2016, 9:22 PM

gerrit uses the m2 db cluster. (m2-master.eqiad.wmnet)

m2 = db1020 as master and db2011 as slave per dbtree

on db1020 the wmf mariadb package 10.0.15-1 is installed, the Ubuntu distro package is "rc"

ii  wmf-mariadb10                        10.0.15-1                            amd64        MariaDB plus patches.
Bawolff renamed this task from Gerrit: Convert gerrit's db caractor encoding from utf8 to utf8mb4 to Gerrit: Convert gerrit's db caractor encoding from utf8 to utf8mb4 to prevent truncation of astral characters.Dec 21 2016, 9:23 PM

@Bawolff I was only going by the security advisory from that website.

"
The content got truncated at the first astral Unicode symbol, in this case 𝌆 — so, attempting to insert foo𝌆bar actually inserted foo instead, resulting in data loss (and possibly introducing security issues; see below). MySQL returned a warning message, too:

As shown above, this behavior can lead to data loss, but it gets worse — it can result in security vulnerabilities. Here are some examples, all of which were discovered after publishing this write-up:
•PHP object injection vulnerability in WordPress < 3.6.1, leading to remote code execution in combination with certain WordPress plugins
•Email authentication bypass in Phabricator
•Stored XSS in WordPress 4.1.2
•Remote command execution in the Joomla! CMS

TL;DR MySQL’s utf8 encoding is awkwardly named, as it’s different from proper UTF-8 encoding. It doesn’t offer full Unicode support, which can lead to data loss or security vulnerabilities.
"

that's from the website

@Bawolff also one example of data loss is T145885

So the above does not say all the security issues, only some.

Aklapper added a comment.EditedDec 21 2016, 10:18 PM

@Paladox: To clarify, does that mean that you are not aware of a specific type of Security issue with specific regard to Gerrit?

@Aklapper nope. But if that site is suggesting there could be security problems we should not just ignore it.

gerrit uses the m2 db cluster. (m2-master.eqiad.wmnet)

m2 = db1020 as master and db2011 as slave per dbtree

on db1020 the wmf mariadb package 10.0.15-1 is installed, the Ubuntu distro package is "rc"

ii  wmf-mariadb10                        10.0.15-1                            amd64        MariaDB plus patches.

Is this the reviewdb on m2?

Is this the reviewdb on m2?

Yes, gerrit database is named reviewdb and is on m2:

/var/lib/gerrit2/review_site/etc/gerrit.config
[database]
    type = mysql
    hostname = m2-master.eqiad.wmnet
    database = reviewdb
    username = gerrit
    url = jdbc:mysql://m2-master.eqiad.wmnet/reviewdb?characterSetResults=utf8&characterEncoding=utf8&connectionCollation=utf8_unicode_ci

These are the utf8 tables present in that db:

root@db1020:~# for i in `mysql --skip-ssl reviewdb -e "show tables;";`; do echo "====$i===="; mysql --skip-ssl reviewdb -e "show create table $i\G" | grep "CHARSET=utf8";done
====account_external_ids====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====account_group_by_id====
====account_group_by_id_aud====
====account_group_id====
) ENGINE=InnoDB AUTO_INCREMENT=1290 DEFAULT CHARSET=utf8
====account_group_members====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====account_group_members_audit====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====account_group_names====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====account_groups====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====account_id====
) ENGINE=InnoDB AUTO_INCREMENT=4393 DEFAULT CHARSET=utf8
====account_project_watches====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====accounts====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====change_id====
) ENGINE=InnoDB AUTO_INCREMENT=328640 DEFAULT CHARSET=utf8
====change_message_id====
) ENGINE=InnoDB AUTO_INCREMENT=1177 DEFAULT CHARSET=utf8
====change_messages====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====changes====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====patch_comments====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====patch_set_approvals====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====patch_sets====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====schema_version====
) ENGINE=InnoDB DEFAULT CHARSET=utf8
====system_config====
) ENGINE=InnoDB DEFAULT CHARSET=utf8

This is not a security vulnerability unless the code is vulnerable. MySQL's utf8 is known to not support all characters, that is why we use binary on our mediawiki installations. It is the application responsibility to use utf8mb4, available 6 years ago. We do not need to give this any kind of high priority- things will just be cut on insertion, or, if gerrit did things right, it would set T108255: Enable MariaDB/MySQL's Strict Mode and writes would fail to be inserted.

This is not a new issue, it is a very old one.

Paladox added a comment.EditedDec 22 2016, 10:39 AM

Oh so if you doint think this is a security problem should we open the task to everyone and remove the security project please?

This would need to be done at some point to fix T145885

Well we can probably just close this task and follow up on T145885 and maybe switch to utf8mb4.

Dzahn closed this task as Resolved.Dec 22 2016, 4:14 PM
Dzahn claimed this task.

Sounds like we can close it (declined?), but also make it public. Then continue on the existing ticket. Unless there are any objections.

Dzahn changed the task status from Resolved to Invalid.Dec 22 2016, 4:14 PM
Dzahn removed Dzahn as the assignee of this task.

hmm. using status "invalid" as duplicate of T145885, correct me if i'm wrong.

Bawolff changed the visibility from "Custom Policy" to "Public (No Login Required)".

Change 328571 abandoned by Paladox:
Gerrit: Convert from utf8 to utf8mb4

Reason:
We will go with https://gerrit.wikimedia.org/r/#/c/330455/ instead.

https://gerrit.wikimedia.org/r/328571