Page MenuHomePhabricator

Gerrit shows HTTP 500 error when pasting extended unicode characters
Closed, ResolvedPublic

Description

To convert the db do the following

$ mysql -BN reviewdb -e "SHOW TABLES" | while read table; do mysql reviewdb -e "ALTER TABLE $table ROW_FORMAT=DYNAMIC"; done
$ mysql -e "ALTER DATABASE reviewdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"
$ mysql -BN reviewdb -e "SHOW TABLES" | while read table; do mysql reviewdb -e "ALTER TABLE $table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"; done

We will need to add these mysql variables to my.cnf

innodb_file_format=BARRACUDA
innodb_large_prefix=ON

This also requires this patch https://gerrit.wikimedia.org/r/#/c/328571/


Steps to reproduce:

  • Go to a gerrit patch to comment
  • Paste "🤔" to determine that you're thinking
  • Click on save

I get 500 internal error.

Caused by: java.sql.BatchUpdateException: Data truncation: Incorrect string value: '\xF0\x9F\xA4\x94' for column 'message' at row 1
Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect string value: '\xF0\x9F\xA4\x94' for column 'message' at row 1
[2016-09-16 17:24:34,065] [HTTP-34105] ERROR com.google.gerrit.httpd.restapi.RestApiServlet : Error in POST /r/changes/145575/revisions/1b08fd395b14d9c00fff3c29630ba7e66b3f2ccf/review
com.google.gerrit.server.git.UpdateException: com.google.gwtorm.server.OrmException: insert failure on change_messages
        at com.google.gerrit.server.git.BatchUpdate.executeChangeOps(BatchUpdate.java:363)
        at com.google.gerrit.server.git.BatchUpdate.execute(BatchUpdate.java:291)
        at com.google.gerrit.server.change.PostReview.apply(PostReview.java:164)
        at com.google.gerrit.server.change.PostReview.apply(PostReview.java:137)
        at com.google.gerrit.server.change.PostReview.apply(PostReview.java:89)
        at com.google.gerrit.httpd.restapi.RestApiServlet.service(RestApiServlet.java:329)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)
        at com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)
        at com.google.gerrit.httpd.GetUserFilter.doFilter(GetUserFilter.java:82)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at com.google.gwtexpui.server.CacheControlFilter.doFilter(CacheControlFilter.java:73)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at com.google.gerrit.httpd.RunAsFilter.doFilter(RunAsFilter.java:117)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at com.google.gerrit.httpd.RequireSslFilter.doFilter(RequireSslFilter.java:68)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at com.google.gerrit.httpd.AllRequestFilter$FilterProxy$1.doFilter(AllRequestFilter.java:136)
        at com.google.gerrit.httpd.AllRequestFilter$FilterProxy.doFilter(AllRequestFilter.java:138)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at com.google.gerrit.httpd.RequestContextFilter.doFilter(RequestContextFilter.java:75)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)
        at com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)
        at com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)
        at com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:221)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
        at org.eclipse.jetty.server.Server.handle(Server.java:499)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
        at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
        at java.lang.Thread.run(Thread.java:745)
Caused by: com.google.gwtorm.server.OrmException: insert failure on change_messages
        at com.google.gwtorm.schema.sql.SqlDialect.convertError(SqlDialect.java:160)
        at com.google.gwtorm.schema.sql.DialectMySQL.convertError(DialectMySQL.java:232)
        at com.google.gwtorm.jdbc.JdbcAccess.convertError(JdbcAccess.java:466)
        at com.google.gwtorm.jdbc.JdbcAccess.insert(JdbcAccess.java:171)
        at com.google.gerrit.server.ChangeMessagesUtil.addChangeMessage(ChangeMessagesUtil.java:72)
        at com.google.gerrit.server.change.PostReview$Op.insertMessage(PostReview.java:642)
        at com.google.gerrit.server.change.PostReview$Op.updateChange(PostReview.java:363)
        at com.google.gerrit.server.git.BatchUpdate.executeChangeOps(BatchUpdate.java:352)
        ... 45 more
Caused by: java.sql.BatchUpdateException: Data truncation: Incorrect string value: '\xF0\x9F\xA4\x94' for column 'message' at row 1
        at com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1166)
        at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1773)
        at com.mysql.jdbc.PreparedStatement.executeBatchInternal(PreparedStatement.java:1257)
        at com.mysql.jdbc.StatementImpl.executeBatch(StatementImpl.java:958)
        at com.google.gwtorm.schema.sql.SqlDialect.executeBatch(SqlDialect.java:446)
        at com.google.gwtorm.jdbc.JdbcAccess.execute(JdbcAccess.java:450)
        at com.google.gwtorm.jdbc.JdbcAccess.insertAsBatch(JdbcAccess.java:213)
        at com.google.gwtorm.jdbc.JdbcAccess.insert(JdbcAccess.java:166)
        ... 49 more
Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Incorrect string value: '\xF0\x9F\xA4\x94' for column 'message' at row 1
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3964)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:3902)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2526)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2673)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
        at com.mysql.jdbc.PreparedStatement.executeUpdateInternal(PreparedStatement.java:2073)
        at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1751)
        ... 55 more

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Paladox added a comment.EditedJan 4 2017, 5:08 PM

@demon Oh, continuing to use utf-8 works but pasting an emoji still fails with 500 error.

so the db can be converted anytime just we will need to try and fix this client side to get it to correctly to use utf8mb4.

I did try its with utf8 and converted the tables to utf8mb4 but it seems it still uses utf8 and not utf8mb4.

Dzahn added a comment.Jan 4 2017, 5:18 PM

@Paladox What was the solution that resulted in it _not_ showing a 500 error but just a ? instead. Afair there was one? That would seem like a good compromise to me.

@Dzahn i if we do just &connectionCollation=utf8mb4_unicode_ci that should at least fix the 500 errors but emojis won't show.

@Dzahn and @demon setting ?useUnicode=true should stop the 500 errors but emojis won't show for now (db conversion required too) :)

Change 330455 had a related patch set (by Paladox) published:
Gerrit: Set useUnicode=true

https://gerrit.wikimedia.org/r/330455

hashar removed a subscriber: hashar.Jan 4 2017, 8:23 PM

this was filed in context of emojis, but really it's about all extended unicode that mysql/maria's "utf8" does not support.

Somebody (not me) should fix the task summary.

Dzahn renamed this task from Gerrit shows HTTP 500 error when pasting an emoji to Gerrit shows HTTP 500 error when pasting extended unicode characters.Jan 4 2017, 9:46 PM
Paladox added a comment.EditedJan 5 2017, 5:57 PM

@jcrespo Hi, this https://gerrit.wikimedia.org/r/#/c/330455/ patch should at least stop the 500 error's, the patch requires the db be converted to utf8mb4 anyways.

@jcrespo or @Marostegui Hi, i have tested this https://gerrit.wikimedia.org/r/#/c/330455/ locally. Could you convert the db to utf8mb4 please? Even though we will still be connecting with a utf8 connection, we will have a utf8mb4 collation. But also this will still not get emoji's to work but at least it will fix the 500 errors. When someone posts an emoji they will be converted into ??? which is better then a 500 error :)

Dzahn awarded a token.Jan 7 2017, 3:15 AM

Hi,

Which tables would you need converted in the end? looking at: T145885#2896928 looks like you tried to convert all of them.
Is that really what you'd need or can we just narrow it to the tables that really need it?

@Marostegui hi, the main table we want converted is patch_comment. I haven't really tested if converted that breaks anything else but it likely will since we need to merge https://gerrit.wikimedia.org/r/#/c/330455/ too once the tables are converted which would be that the collation will be utf8mb4 and not utf8. but the connection will stay utf8.

Change 328571 abandoned by Paladox:
Gerrit: Convert from utf8 to utf8mb4

Reason:
We will go with https://gerrit.wikimedia.org/r/#/c/330455/ instead.

https://gerrit.wikimedia.org/r/328571

@jcrespo hi, apparently you can set mutiple mysqld

for example

  1. [mysqld2]
  2. port = 3307
  3. datadir = /var/lib/mysql-databases/mysqld2
  4. pid-file = /var/lib/mysql-databases/mysqld2/mysql.pid
  5. socket = /var/lib/mysql-databases/mysqld2/mysql.sock
  6. user = mysql

See http://www.simplemachines.org/community/index.php?topic=299317.0

so we could do this without affecting other db's.

@jcrespo hi, apparently you can set mutiple mysqld

for example

  1. [mysqld2]
  2. port = 3307
  3. datadir = /var/lib/mysql-databases/mysqld2
  4. pid-file = /var/lib/mysql-databases/mysqld2/mysql.pid
  5. socket = /var/lib/mysql-databases/mysqld2/mysql.sock
  6. user = mysql

    See http://www.simplemachines.org/community/index.php?topic=299317.0

    so we could do this without affecting other db's.

Hey @Paladox

That is indeed possible but it is a pain to maintain from an operational point of view. In fact, we had that set up for our labs sanitarium host and we just finished setting up a new sanitarium host without that because it was really difficult to manage it.
I am afraid that setting up mysqld_multi just for this is way overkill and would bite us again in the future (like it did with the sanitarium example)

Paladox added a comment.EditedJan 18 2017, 3:00 PM

oh.

@Marostegui i guess we should do the conversion of db, it will at least stop gerrit making error's. It will just replace emoji's with ???? which i think is better then nothing for now. I have tested it. It really only requires the patch_comment table to be converted but i also tested with all the db being converted which we should do that one. The connection to it will be utf8, but it will also use utf8mb4_unicode_ci.

I wonder can this utf8mb4 problem be reported upstream please?

I can't do it as it ask for too much personal information but this should be fixed in jdbc to allow us to do this.

Paladox added a comment.EditedJan 18 2017, 5:11 PM

@Marostegui hi, i managed to do mysql_multi, it took a while to setup as i have never done it. but in the end it isn't a pain to maintain since it will allow you to define certain configs for certain db's that you can't do for all db's.

i used mariadb.

@Marostegui hi, i managed to do mysql_multi, it took a while to setup as i have never done it. but in the end it isn't a pain to maintain since it will allow you to define certain configs for certain db's that you can't do for all db's.

i used mariadb.

Hey!

Setting it up isn't a big deal, what is a big deal is to adapt everything around it to manage that host. As I said we already have (and trying to deprecate soon) a host with that configuration.
Some examples:
Monitoring: it needs a different configuration as you have multiple (even if there are only two) mysqld instances.
Firewalling: as you'd need to handle an extra rule to allow connections to 3307 or whichever port you decide.
Puppet: we would need to adapt puppet for this special host.

At the end of the day, it is a pain because it is a non standard configuration, within our architecture, and that make that host a snowflake, which is what we are trying to avoid at all costs.

Trying to handle this issue by adding (more) complexity and special hosts to our systems isn't a good solution I believe.

oh, sorry i didn't realise that the firewall / puppet and monitoring would need to be changed.

Is there any other way we could do this without adding more complexity?

This is a bug in jdbc and should really be fixed.

I wonder are there any other hosts that use utf8mb and unicode? if so we could move those databases onto the same host.

oh, sorry i didn't realise that the firewall / puppet and monitoring would need to be changed.

no worries!! I should have given the full picture earlier, but I hope you enjoyed setting it up at least :-)

Is there any other way we could do this without adding more complexity?

I guess if there is no other way to fix it in code, altering the tables for only the gerrit database could be the way to go?

oh, sorry i didn't realise that the firewall / puppet and monitoring would need to be changed.

no worries!! I should have given the full picture earlier, but I hope you enjoyed setting it up at least :-)

Is there any other way we could do this without adding more complexity?

I guess if there is no other way to fix it in code, altering the tables for only the gerrit database could be the way to go?

Yep, I've done the testing for that one. Also requires this https://gerrit.wikimedia.org/r/#/c/328571/ to be merged at the same time as the db is being converted.

@Marostegui doing the alter gerrit database and applying the patch ^^ will fix the error's preventing a leak some where but it will not fix the emoji's problem entirely but at least it will stop erroring out which is better then nothing + it doesn't add complexity to maintaining it that way :)

Hi,

I have been doing some tests with the gerritdb to sum up all the stuff that have been discussed here in order to convert the tables to utf8mb4. This is what I have needed to do on a database level:

ALTER DATABASE reviewdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Then you suggested to run this:

mysql -BN reviewdb -e "SHOW TABLES" | while read table; do echo $table; mysql reviewdb -e "ALTER TABLE $table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"; done

That will convert ALL tables - ie: tables with charset binary to utf8mb4, will that break stuff?
These are the non utf8 tables:

account_group_by_id
) ENGINE=InnoDB DEFAULT CHARSET=binary
account_group_by_id_aud
) ENGINE=InnoDB DEFAULT CHARSET=binary
working
) ENGINE=InnoDB DEFAULT CHARSET=binary
`

Do they really need to be converted? What can break?

Anyways, continuing with the test:

That fails for the following tables: ERROR 1071 (42000) at line 1: Specified key was too long; max key length is 767 bytes

account_external_ids
account_group_names
account_project_watches
patch_comments
patch_set_approvals

In order to get those changed we need to change two global variables (that will affect all the databases in the host):

set global innodb_large_prefix = 1;
set global innodb_file_format = Barracuda;

After that we need to alter the affected tables to get them with ROW_FORMAT=DYNAMIC and change them to utf8mb4:

for i in account_external_ids account_group_names account_project_watches patch_comments patch_set_approvals; do echo "***$i***"; mysql reviewdb -e "show create table $i\G" | grep CHARSET;done
***account_external_ids***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***account_group_names***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***account_project_watches***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***patch_comments***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***patch_set_approvals***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC

All these changes would requiere coordination between master and slave(s) and also we need to make sure we impact of changing the two global variables mentioned above.

demon added a comment.Jan 19 2017, 6:42 PM

That will convert ALL tables - ie: tables with charset binary to utf8mb4, will that break stuff?
These are the non utf8 tables:

account_group_by_id
) ENGINE=InnoDB DEFAULT CHARSET=binary
account_group_by_id_aud
) ENGINE=InnoDB DEFAULT CHARSET=binary
working
) ENGINE=InnoDB DEFAULT CHARSET=binary
`

Do they really need to be converted? What can break?

No, these don't need changing. account_group_by_id and account_group_by_id_aud are only given internal-to-gerrit input (timestamps, uuids, ints, etc) so we don't need to worry about non-ASCII here. working was a temporary table I made in the process of fixing T152640, didn't realize I hadn't dropped it (just did so now).

Hi,

I have been doing some tests with the gerritdb to sum up all the stuff that have been discussed here in order to convert the tables to utf8mb4. This is what I have needed to do on a database level:

ALTER DATABASE reviewdb CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Then you suggested to run this:

mysql -BN reviewdb -e "SHOW TABLES" | while read table; do echo $table; mysql reviewdb -e "ALTER TABLE $table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci"; done

That will convert ALL tables - ie: tables with charset binary to utf8mb4, will that break stuff?
These are the non utf8 tables:

account_group_by_id
) ENGINE=InnoDB DEFAULT CHARSET=binary
account_group_by_id_aud
) ENGINE=InnoDB DEFAULT CHARSET=binary
working
) ENGINE=InnoDB DEFAULT CHARSET=binary
`

Do they really need to be converted? What can break?

Anyways, continuing with the test:

That fails for the following tables: ERROR 1071 (42000) at line 1: Specified key was too long; max key length is 767 bytes

account_external_ids
account_group_names
account_project_watches
patch_comments
patch_set_approvals

In order to get those changed we need to change two global variables (that will affect all the databases in the host):

set global innodb_large_prefix = 1;
set global innodb_file_format = Barracuda;

After that we need to alter the affected tables to get them with ROW_FORMAT=DYNAMIC and change them to utf8mb4:

for i in account_external_ids account_group_names account_project_watches patch_comments patch_set_approvals; do echo "***$i***"; mysql reviewdb -e "show create table $i\G" | grep CHARSET;done
***account_external_ids***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***account_group_names***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***account_project_watches***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***patch_comments***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC
***patch_set_approvals***
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC

All these changes would requiere coordination between master and slave(s) and also we need to make sure we impact of changing the two global variables mentioned above.

Oh, sorry didn't see your reply until now, thanks for also testing this, didn't realise some tables were binary, but @demon has now dropped those tables.

I have managed to build mysql connector locally (Built version 6) editing the source code to set character_set_server utf8mb4 worked in changing it but didn't work in getting the connection to use utf8mb4.

So the only way i see here is to convert the db and apply the patch i did.

Im wondering would using a utf8 connection break something if the db is using utf8mb4?

As i have been doing this and haven't noticed really any problems.

demon added a comment.Jan 19 2017, 6:50 PM

Oh, sorry didn't see your reply until now, thanks for also testing this, didn't realise some tables were binary, but @demon has now dropped those tables.

To be clear, I only dropped working, which was my own table to begin with.

Oh, we can ignore converting the tables that were binary as well the connection is already utf8 and we haven't seen a problem.

(i've created this patch against mysql connector, though we won't be using it, just putting it here for others to use, this works too, as long as useUnicode and characterEncoding=utf-8 is set)

1diff --git a/build.xml b/build.xml
2index 8a92c99..81a025d 100644
3--- a/build.xml
4+++ b/build.xml
5@@ -219,7 +219,7 @@ See also com.mysql.cj.core.conf.PropertyDefinitions.SYSP_* variables for other t
6 </condition>
7
8 <!-- The following property are needed for finding the JDK needed for compile and can be passed on the command line to ant via -D switch. -->
9- <property name="com.mysql.cj.build.jdk" value="/usr/lib/jvm/jdk1.8" />
10+ <property name="com.mysql.cj.build.jdk" value="/usr/lib/jvm/java-8-openjdk-amd64" />
11
12 <!-- The folloing property sets whether javac should generate debugging info or not and can be passed on the command line to ant via -D switch. -->
13 <property name="com.mysql.cj.build.addDebugInfo" value="on" />
14@@ -324,7 +324,7 @@ See also com.mysql.cj.core.conf.PropertyDefinitions.SYSP_* variables for other t
15 resultproperty="com.mysql.cj.build.jdk.exitstatus">
16 <arg value="-version" />
17 </exec>
18-
19+<!--
20 <fail message="Java 8 is required. Set the full path to this JDK home with the property 'com.mysql.cj.build.jdk'. Default: '/usr/lib/jvm/jdk1.8').">
21 <condition>
22 <not>
23@@ -334,7 +334,7 @@ See also com.mysql.cj.core.conf.PropertyDefinitions.SYSP_* variables for other t
24 </and>
25 </not>
26 </condition>
27- </fail>
28+ </fail>-->
29 </target>
30
31
32diff --git a/src/main/java/com/mysql/cj/jdbc/ConnectionImpl.java b/src/main/java/com/mysql/cj/jdbc/ConnectionImpl.java
33index d3eb7ed..82e390a 100644
34--- a/src/main/java/com/mysql/cj/jdbc/ConnectionImpl.java
35+++ b/src/main/java/com/mysql/cj/jdbc/ConnectionImpl.java
36@@ -1333,14 +1333,17 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
37
38 if (!this.useOldUTF8Behavior.getValue()) {
39 if (dontCheckServerMatch || !this.session.characterSetNamesMatches("utf8") || (!this.session.characterSetNamesMatches("utf8mb4"))) {
40- execSQL(null, "SET NAMES " + (useutf8mb4 ? "utf8mb4" : "utf8"), -1, null, false, this.database, null, false);
41- this.session.getServerVariables().put("character_set_client", useutf8mb4 ? "utf8mb4" : "utf8");
42- this.session.getServerVariables().put("character_set_connection", useutf8mb4 ? "utf8mb4" : "utf8");
43+ execSQL(null, "SET NAMES utf8mb4"/* + (useutf8mb4 ? "utf8mb4" : "utf8")*/, -1, null, false, this.database, null, false);
44+ this.session.getServerVariables().put("character_set_client", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
45+ this.session.getServerVariables().put("character_set_connection", "uf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
46+ this.session.getServerVariables().put("character_set_results", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
47+ this.session.getServerVariables().put("character_set_server", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
48 }
49 } else {
50- execSQL(null, "SET NAMES latin1", -1, null, false, this.database, null, false);
51- this.session.getServerVariables().put("character_set_client", "latin1");
52- this.session.getServerVariables().put("character_set_connection", "latin1");
53+ execSQL(null, "SET NAMES utf8mb4"/* latin1"*/, -1, null, false, this.database, null, false);
54+ this.session.getServerVariables().put("character_set_client", "utf8mb4"/* "latin1"*/);
55+ this.session.getServerVariables().put("character_set_connection", "utf8mb4"/* "latin1"*/);
56+ this.session.getServerVariables().put("character_set_server", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
57 }
58
59 this.characterEncoding.setValue(realJavaEncoding);
60@@ -1350,9 +1353,13 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
61 if (mysqlCharsetName != null) {
62
63 if (dontCheckServerMatch || !this.session.characterSetNamesMatches(mysqlCharsetName)) {
64- execSQL(null, "SET NAMES " + mysqlCharsetName, -1, null, false, this.database, null, false);
65- this.session.getServerVariables().put("character_set_client", mysqlCharsetName);
66- this.session.getServerVariables().put("character_set_connection", mysqlCharsetName);
67+ execSQL(null, "SET NAMES " + "utf8mb4"/*mysqlCharsetName*/, -1, null, false, this.database, null, false);
68+ /* this.session.getServerVariables().put("character_set_client", mysqlCharsetName);
69+ this.session.getServerVariables().put("character_set_connection", mysqlCharsetName);*/
70+ this.session.getServerVariables().put("character_set_client", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
71+ this.session.getServerVariables().put("character_set_connection", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
72+ this.session.getServerVariables().put("character_set_results", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
73+ this.session.getServerVariables().put("character_set_server", "utf8mb4" /*useutf8mb4 ? "utf8mb4" : "utf8"*/);
74 }
75 }
76
77@@ -1365,7 +1372,8 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
78 String mysqlCharsetName = getSession().getServerCharset();
79
80 if (this.useOldUTF8Behavior.getValue()) {
81- mysqlCharsetName = "latin1";
82+ // mysqlCharsetName = "latin1";
83+ mysqlCharsetName = "utf8mb4";
84 }
85
86 boolean ucs2 = false;
87@@ -1380,9 +1388,11 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
88
89 if (dontCheckServerMatch || !this.session.characterSetNamesMatches(mysqlCharsetName) || ucs2) {
90 try {
91- execSQL(null, "SET NAMES " + mysqlCharsetName, -1, null, false, this.database, null, false);
92- this.session.getServerVariables().put("character_set_client", mysqlCharsetName);
93- this.session.getServerVariables().put("character_set_connection", mysqlCharsetName);
94+ execSQL(null, "SET NAMES utf8mb4" /* + mysqlCharsetName*/, -1, null, false, this.database, null, false);
95+ this.session.getServerVariables().put("character_set_client", "utf8mb4"/*, mysqlCharsetName*/);
96+ this.session.getServerVariables().put("character_set_connection", "utf8mb4"/*mysqlCharsetName*/);
97+ this.session.getServerVariables().put("character_set_results", "utf8mb4"/*mysqlCharsetName*/);
98+ this.session.getServerVariables().put("character_set_server", "utf8mb4"/*mysqlCharsetName*/);
99 } catch (PasswordExpiredException ex) {
100 if (this.disconnectOnExpiredPasswords.getValue()) {
101 throw ex;
102@@ -1436,9 +1446,10 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
103
104 if (this.useOldUTF8Behavior.getValue()) {
105 try {
106- execSQL(null, "SET NAMES latin1", -1, null, false, this.database, null, false);
107- this.session.getServerVariables().put("character_set_client", "latin1");
108- this.session.getServerVariables().put("character_set_connection", "latin1");
109+ execSQL(null, "SET NAMES utf8mb4", /*latin1",*/ -1, null, false, this.database, null, false);
110+ this.session.getServerVariables().put("character_set_client", "utf8mb4" /*"latin1"*/);
111+ this.session.getServerVariables().put("character_set_connection", "utf8mb4" /*"latin1"*/);
112+ this.session.getServerVariables().put("character_set_server", "utf8mb4" /*"latin1"*/);
113 } catch (PasswordExpiredException ex) {
114 if (this.disconnectOnExpiredPasswords.getValue()) {
115 throw ex;
116@@ -1470,11 +1481,11 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
117 }
118
119 if (!mysqlEncodingName.equalsIgnoreCase(this.session.getServerVariable("character_set_results"))) {
120- StringBuilder setBuf = new StringBuilder("SET character_set_results = ".length() + mysqlEncodingName.length());
121- setBuf.append("SET character_set_results = ").append(mysqlEncodingName);
122+ StringBuilder setBuf = new StringBuilder("SET character_set_results = utf8mb4"/* ".length() + mysqlEncodingName.length()*/);
123+ setBuf.append("SET character_set_results = ").append("utf8mb4");
124
125 try {
126- execSQL(null, setBuf.toString(), -1, null, false, this.database, null, false);
127+ execSQL(null, "set character_set_results = utf8mb4" /*setBuf.toString()*/, -1, null, false, this.database, null, false);
128 } catch (PasswordExpiredException ex) {
129 if (this.disconnectOnExpiredPasswords.getValue()) {
130 throw ex;
131@@ -1485,7 +1496,7 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
132 }
133 }
134
135- this.session.getServerVariables().put(ServerSession.JDBC_LOCAL_CHARACTER_SET_RESULTS, mysqlEncodingName);
136+ this.session.getServerVariables().put(ServerSession.JDBC_LOCAL_CHARACTER_SET_RESULTS,"utf8mb4" /* mysqlEncodingName*/);
137
138 // We have to set errorMessageEncoding according to new value of charsetResults for server version 5.5 and higher
139 this.session.setErrorMessageEncoding(charsetResults);
140@@ -1497,11 +1508,13 @@ public class ConnectionImpl extends AbstractJdbcConnection implements JdbcConnec
141
142 String connectionCollation = getPropertySet().getStringReadableProperty(PropertyDefinitions.PNAME_connectionCollation).getStringValue();
143 if (connectionCollation != null) {
144- StringBuilder setBuf = new StringBuilder("SET collation_connection = ".length() + connectionCollation.length());
145+ /* StringBuilder setBuf = new StringBuilder("SET collation_connection = ".length() + connectionCollation.length());
146 setBuf.append("SET collation_connection = ").append(connectionCollation);
147-
148+*/
149 try {
150- execSQL(null, setBuf.toString(), -1, null, false, this.database, null, false);
151+ execSQL(null, "set collation_connection = utf8mb4_unicode_ci", -1, null, false, this.database, null, false);
152+ execSQL(null, "set collation_database = utf8mb4_unicode_ci", -1, null, false, this.database, null, false);
153+ execSQL(null, "set collation_server = utf8mb4_unicode_ci", -1, null, false, this.database, null, false);
154 } catch (PasswordExpiredException ex) {
155 if (this.disconnectOnExpiredPasswords.getValue()) {
156 throw ex;
157diff --git a/src/main/java/com/mysql/jdbc/Driver.java b/src/main/java/com/mysql/jdbc/Driver.java
158index 7a548a6..9ba536f 100644
159--- a/src/main/java/com/mysql/jdbc/Driver.java
160+++ b/src/main/java/com/mysql/jdbc/Driver.java
161@@ -30,11 +30,11 @@ import java.sql.SQLException;
162 */
163 public class Driver extends com.mysql.cj.jdbc.Driver {
164 public Driver() throws SQLException {
165- super();
166+ /* super();*/
167 }
168
169- static {
170+ /* static {
171 System.err.println("Loading class `com.mysql.jdbc.Driver'. This is deprecated. The new driver class is `com.mysql.cj.jdbc.Driver'. "
172 + "The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.");
173- }
174+ }*/
175 }

That will convert ALL tables - ie: tables with charset binary to utf8mb4, will that break stuff?
These are the non utf8 tables:

account_group_by_id
) ENGINE=InnoDB DEFAULT CHARSET=binary
account_group_by_id_aud
) ENGINE=InnoDB DEFAULT CHARSET=binary
working
) ENGINE=InnoDB DEFAULT CHARSET=binary
`

Do they really need to be converted? What can break?

No, these don't need changing. account_group_by_id and account_group_by_id_aud are only given internal-to-gerrit input (timestamps, uuids, ints, etc) so we don't need to worry about non-ASCII here. working was a temporary table I made in the process of fixing T152640, didn't realize I hadn't dropped it (just did so now).

Thanks for confirming - it sounded strange but I was confused as @Paladox mentioned here that he altered all of them: T145885#2896928 - so better be safe than sorry :-)

@Marostegui and @jcrespo and @demon and @Dzahn i found a new lib we can use it's https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/ (Api is a bit different but i managed to get it working by doing this

url = jdbc:mariadb://localhost/reviewdb?sessionVariables=character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,collation_connection=utf8mb4_unicode_ci,collation_database=utf8mb4_unicode_ci,collation_server=utf8mb4_unicode_ci

This requires the lib i linked too, but it is not that easy to replace. I have this change upstream to add support for it.

Here https://gerrit-review.googlesource.com/#/c/95450/

With out sessionVariables set, emoji's will not get inserted. As soon as i do it, it works :).

Oh, wait, mysql connector supports this too.

Nope doesn't work on mysql's version of the connector. Works on mariadb's one though.

@Marostegui and @jcrespo and @demon and @Dzahn i found a new lib we can use it's https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/ (Api is a bit different but i managed to get it working by doing this

url = jdbc:mariadb://localhost/reviewdb?sessionVariables=character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,collation_connection=utf8mb4_unicode_ci,collation_database=utf8mb4_unicode_ci,collation_server=utf8mb4_unicode_ci

This requires the lib i linked too, but it is not that easy to replace. I have this change upstream to add support for it.

Here https://gerrit-review.googlesource.com/#/c/95450/

With out sessionVariables set, emoji's will not get inserted. As soon as i do it, it works :).

Does this mean that we wouldn't need to convert anything to utf8mb4, table-wise?
Because of this:

character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,
Paladox added a comment.EditedJan 27 2017, 8:10 AM

@Marostegui and @jcrespo and @demon and @Dzahn i found a new lib we can use it's https://mariadb.com/kb/en/mariadb/about-mariadb-connector-j/ (Api is a bit different but i managed to get it working by doing this

url = jdbc:mariadb://localhost/reviewdb?sessionVariables=character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,collation_connection=utf8mb4_unicode_ci,collation_database=utf8mb4_unicode_ci,collation_server=utf8mb4_unicode_ci

This requires the lib i linked too, but it is not that easy to replace. I have this change upstream to add support for it.

Here https://gerrit-review.googlesource.com/#/c/95450/

With out sessionVariables set, emoji's will not get inserted. As soon as i do it, it works :).

Does this mean that we wouldn't need to convert anything to utf8mb4, table-wise?
Because of this:

character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,

@Marostegui It would still require the table to be converted. But I found a full fix for fixing this task. I still need to test it once more to make sure I am giving the correct information here.

On the DBA side of things, I am ok with that proposal, asking for @demon opinion, or anyone at gerrit app layer on doing that.

demon added a comment.Jan 27 2017, 2:43 PM

On the DBA side of things, I am ok with that proposal, asking for @demon opinion, or anyone at gerrit app layer on doing that.

If upstream accepts the Maria DB-specific connector, I don't have any objections. I don't want to diverge for it though.

On the DBA side of things, I am ok with that proposal, asking for @demon opinion, or anyone at gerrit app layer on doing that.

If upstream accepts the Maria DB-specific connector, I don't have any objections. I don't want to diverge for it though.

Ok, we could still do the conversion. Just emojis won't work which I think is fine for the moment as that will lead to no data loss when writing comments with emojis.

Paladox added a comment.EditedJan 27 2017, 4:11 PM

yep the mariadb plugin fixes it.

sudo mysql -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 1049
Server version: 10.1.21-MariaDB-1~jessie mariadb.org binary distribution

Copyright (c) 2000, 2016, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+

Variable_nameValue

+--------------------------+----------------------------+

character_set_clientutf8
character_set_connectionutf8
character_set_databaselatin1
character_set_filesystembinary
character_set_resultsutf8
character_set_serverlatin1
character_set_systemutf8
character_sets_dir/usr/share/mysql/charsets/

+--------------------------+----------------------------+
8 rows in set (0.02 sec)

MariaDB [(none)]> SHOW VARIABLES LIKE 'coll%';
+----------------------+-------------------+

Variable_nameValue

+----------------------+-------------------+

collation_connectionutf8_general_ci
collation_databaselatin1_swedish_ci
collation_serverlatin1_swedish_ci

+----------------------+-------------------+
3 rows in set (0.01 sec)

that just proofs that the db is not actually utf8mb4. So the fix in jdbc works :)

You can also have emoji's in your commit msg too :)

http://gerrit-new.wmflabs.org/#/c/24/

It turns out we can support this right now. By doing this

[database]
driver = org.mariadb.jdbc.Driver
hostname = localhost
database = reviewdb
username = root
url = jdbc:mariadb://localhost/reviewdb?sessionVariables=character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,collation_connection=utf8mb4_unicode_ci,collation_database=utf8mb4_unicode_ci,collation_server=utf8mb4_unicode_ci
type = jdbc

notice type is jdbc and driver is added to. Per comments at https://gerrit-review.googlesource.com/#/c/95673/

We can add the mariadb connector in the gerrit deb we have.

I tested this too locally.

Change 336002 had a related patch set uploaded (by Paladox):
Add mariadb-java-client

https://gerrit.wikimedia.org/r/336002

Change 336003 had a related patch set (by Paladox) published:
Gerrit: Use the mariadb plugin instead of mysql

https://gerrit.wikimedia.org/r/336003

@demon @jcrespo @Marostegui Upstream helped me find a way we can support the mariadb plugin without needing to do an upgrade to gerrit to support it :). All the patches above should be all that's necessary to upgrade (Still requires the db tables to be converted to utf8mb4 and utf8mb4_unicode_gi :)

Paladox raised the priority of this task from Low to Normal.Feb 4 2017, 5:55 PM

@Paladox: Please do not add random projects to tasks.

@Aklapper gerrit is maintained by releng. Which the task i added is not random.

demon added a comment.Feb 6 2017, 6:01 PM

@Paladox: Please do not add random projects to tasks.

Please do not remove appropriate tags or add random comments to tasks.

We can still do https://gerrit.wikimedia.org/r/#/c/336002/ since I doint see an urgency to have it in a separate repo and I doint see any other software at wikimedia using the plugin.

We can always put it in a separate repo once DBA have time.

Upstream gerrit has added in built support for the MariaDB connector. So we won't need it packaged separately. but we would need to package it in gerrit like we do for the other libs.

See https://gerrit-review.googlesource.com/#/c/95450/

Anyways we can do this either in gerrit 2.13 or 2.14 ( using the inbuilt support)

Not sure if we should bother doing this as I found problems when upgrading a gerrit install from 2.13 to 2.14 with this as it kept complaining about key length since utf8m4 has 4 and UTF8 has 3.

Change 330455 abandoned by Dzahn:
Gerrit: Set useUnicode=true, also change connectionCollation to utf8mb4_unicode_ci

https://gerrit.wikimedia.org/r/330455

Change 336002 abandoned by Paladox:
Add mariadb-java-client

https://gerrit.wikimedia.org/r/336002

I doint think this is worth it now since reviewdb is being removed from gerrit soon starting with 3.x and 2.15 being the last release to support reviewdb.

T174034

Emojis will be supported with notedb I think.

Dzahn lowered the priority of this task from Normal to Low.Aug 26 2017, 5:20 PM
Joe removed a subscriber: Joe.Aug 28 2017, 8:24 AM

I doint think this is worth it now since reviewdb is being removed from gerrit soon starting with 3.x and 2.15 being the last release to support reviewdb.

T174034

Emojis will be supported with notedb I think.

Can this be closed then? I don't think anyone will be working on this maintenance any time soon.

I doint think this is worth it now since reviewdb is being removed from gerrit soon starting with 3.x and 2.15 being the last release to support reviewdb.

T174034

Emojis will be supported with notedb I think.

Can this be closed then? I don't think anyone will be working on this maintenance any time soon.

We will be doing it when we upgrade to 2.15 which is still a while a way. :)

I guess we can remove DBA now as there is nothing here now for the DBA to do?

Dzahn removed a project: DBA.Oct 6 2017, 2:50 PM

Change 336003 abandoned by Paladox:
Gerrit: Use the mariadb plugin instead of mysql

https://gerrit.wikimedia.org/r/336003

Dzahn closed this task as Declined.Oct 19 2017, 7:49 AM

Can this be closed then? I don't think anyone will be working on this maintenance any time soon.

I doint think this is worth it now since reviewdb is being removed from gerrit soon starting with 3.x and 2.15 being the last release to support reviewdb.

But it will be fixed when we migrate to notedb.

Legoktm reopened this task as Stalled.Nov 21 2017, 1:22 AM
Legoktm added a subscriber: Legoktm.

Re-opening and setting as stalled. This isn't declined, it's just waiting on a new upstream upgrade.

The gerrit upgrade will be happening soon so as soon as the upgrade is done notedb will quick in and start migrating to notedb.

Paladox closed this task as Resolved.Jun 9 2018, 1:08 PM

We are now on 2.15 and just tested and emoji's work now!

zeljkofilipin awarded a token.