Page MenuHomePhabricator

Extended characters show up as "?" in Gerrit user names
Closed, ResolvedPublic

Description

Author: beau

Description:
There is an account 'User:Szymon Świerkosz' on labsconsole wiki, however gerrit shows it as 'Szymon ?wierkosz'. I have provided the URL for an example page.


Version: unspecified
Severity: critical
URL: https://gerrit.wikimedia.org/r/4040

Details

Reference
bz35626

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 12:15 AM
bzimport added a project: Gerrit.
bzimport set Reference to bz35626.

Adjusting bug summary... I assume this is upstream, but don't really know for sure.

Probably dupe of the other gerrit unicode bug.

This is very likely an upstream problem, but it seems to be specific to user names. For example, in https://gerrit.wikimedia.org/r/4040 , Szymon's name is shown correctly in the "committer" field, but incorrectly in the "reviewer" and "owner" fields.

(In reply to comment #3)

This is very likely an upstream problem, but it seems to be specific to user
names.

What about this issue suggests it's an upstream problem?

Well pretty much everything with gerrit is an upstream problem ;-)

Like the other unicode bugs, we can probably work around this though.

Here's an interesting one:

http://code.google.com/p/gerrit/issues/detail?id=1082

They say UTF-8 won't work with MySQL :/

MediaWiki works absolutely fine with MySQL and Unicode.

The correct phasing would be Gerrit does not support Unicode when using MySQL as backend.

(In reply to comment #7)

MediaWiki works absolutely fine with MySQL and Unicode.

The correct phasing would be Gerrit does not support Unicode when using MySQL
as backend.

Was about to say, this definitely sounds like a Gerrit problem.

(In reply to comment #8)

(In reply to comment #7)

MediaWiki works absolutely fine with MySQL and Unicode.

The correct phasing would be Gerrit does not support Unicode when using MySQL
as backend.

Was about to say, this definitely sounds like a Gerrit problem.

As I said upstream, Gerrit claiming this doesn't work is just silly. I've already theorized that we can just change the collations and this will work, but I haven't tested yet.

If someone wants to test this theory, we can set you up with access to the gerrit project on labs (which is already running 2.3).

Nope - tested with 2.3-rc0-158-g34ab429 - I have utf8_unicode_ci on all MySQL tables and I get question marks.

A bit newer Gerrit deployed on PostgreSQL is fine.

(In reply to comment #10)

Nope - tested with 2.3-rc0-158-g34ab429 - I have utf8_unicode_ci on all MySQL
tables and I get question marks.

We've got 2.3 final on gerrit-dev on labs so we can test there. Want me to add you? I'm wondering if making the fields binary like we do in MediaWiki would work...but that's a bigger change than just the collations on the tables.

A bit newer Gerrit deployed on PostgreSQL is fine.

I really don't see us moving to PG or H2, so we need to find a fix. I *refuse* to believe Gerrit that this is unfixable on MySQL.

Created attachment 10411
Tell gerrit to use UTF-8 with MySQL

My MySQL database is in UTF-8 and it sees that gerrit stores the values properly.

A patch attached forces gerrit to use UTF-8 when connecting to MySQL.

Attached:

^demon, can you try this change in the configuration (assuming we can have tables in UTF-8):

[database]

type = JDBC
driver = com.mysql.jdbc.Driver
url = jdbc:mysql://localhost/reviewdb?characterSetResults=utf8&characterEncoding=utf8&connectionCollation=utf8_unicode_ci
username = gerrit2

"database" and "hostname" entries should be removed. "username" should stay.

  • Bug 35455 has been marked as a duplicate of this bug. ***

I don't think that a dataloss bug should be Low/Normal.

The following tables are definitely affected and need some sort of fix:

account_external_ids
accounts
changes
patch_comments

These tables aren't currently affected, but could be if we put non-ASCII data into them.

account_group_names
account_groups
approval_categories
approval_category_values
change_messages
tracking_ids

Ok, collation has been updated on all tables, and https://gerrit.wikimedia.org/r/#change,6439 has been submitted to change the connection url.

(In reply to comment #13)

^demon, can you try this change in the configuration (assuming we can have
tables in UTF-8):

[database]

type = JDBC
driver = com.mysql.jdbc.Driver
url =

jdbc:mysql://localhost/reviewdb?characterSetResults=utf8&characterEncoding=utf8&connectionCollation=utf8_unicode_ci

username = Gerrit change #2

"database" and "hostname" entries should be removed. "username" should stay.

Ok, I changed the collation/charset on all the tables, and we updated the connection string. The database is now showing the correct data (yay!), but we're still not getting the right data to the UI.

See the owner on https://gerrit.wikimedia.org/r/#change,6388 which is an improvement although still not correct.

Looks "better" now.

I would then try connecting to MySQL via JDBC directly and see if it's okay.
You can try https://gerrit-review.googlesource.com/#/c/34670/ to play live with data obtained from the SQL database via Gerrit's ORM or play directly.

You can try this code:

http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/60187/focus=60206

to check what really JDBC sees.

I hope you didn't end up with a double-encoded UTF-8 in the database (quite easy to do with MySQL, harder to recover) - so that Ś is not 0xC5 0x9A but 0xC3 0x85 0xC2 0x9A instead.

Some data from my MySQL instance:

$ mysql -u root -p reviewdb --default-character-set=utf8
Enter password:
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1212
Server version: 5.0.92 FreeBSD port: mysql-server-5.0.92

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> \s

mysql Ver 14.12 Distrib 5.0.92, for portbld-freebsd8.2 (amd64) using 5.2

Connection id: 1212
Current database: reviewdb
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.0.92 FreeBSD port: mysql-server-5.0.92
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: utf8
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /tmp/mysql.sock
Uptime: 21 days 8 hours 24 min 17 sec

Threads: 6 Questions: 339557 Slow queries: 0 Opens: 85 Flush tables: 1 Open tables: 64 Queries per second avg: 0.184

mysql> show full columns from accounts;
+----------------------------------------+--------------+-----------------+------+-----+-------------------+-------+---------------------------------+---------+

FieldTypeCollationNullKeyDefaultExtraPrivilegesComment

+----------------------------------------+--------------+-----------------+------+-----+-------------------+-------+---------------------------------+---------+

registered_ontimestampNULLNOCURRENT_TIMESTAMPselect,insert,update,references
full_namevarchar(255)utf8_binYESMULNULLselect,insert,update,references
preferred_emailvarchar(255)utf8_binYESMULNULLselect,insert,update,references
contact_filed_ontimestampNULLYESNULLselect,insert,update,references
maximum_page_sizesmallint(6)NULLNO0select,insert,update,references
show_site_headerchar(1)utf8_unicode_ciNONselect,insert,update,references
use_flash_clipboardchar(1)utf8_unicode_ciNONselect,insert,update,references
download_urlvarchar(20)utf8_binYESNULLselect,insert,update,references
download_commandvarchar(20)utf8_binYESNULLselect,insert,update,references
copy_self_on_emailchar(1)utf8_unicode_ciNONselect,insert,update,references
date_formatvarchar(10)utf8_binYESNULLselect,insert,update,references
time_formatvarchar(10)utf8_binYESNULLselect,insert,update,references
display_patch_sets_in_reverse_orderchar(1)utf8_unicode_ciNONselect,insert,update,references
display_person_name_in_review_categorychar(1)utf8_unicode_ciNONselect,insert,update,references
inactivechar(1)utf8_unicode_ciNONselect,insert,update,references
account_idint(11)NULLNOPRI0select,insert,update,references

+----------------------------------------+--------------+-----------------+------+-----+-------------------+-------+---------------------------------+---------+
16 rows in set (0.02 sec)

mysql> select full_name from accounts where preferred_email like 'saper%' \G

  • 1. row *******

full_name: Marcin Cieślak
1 row in set (0.00 sec)

Additionally, here's the output of my sane MySQL Gerrit instance via the Gerrit Inspector feature (patching your gerrit with https://gerrit-review.googlesource.com/#/c/34670/ should be mostly harmless :):

(lost of startup messages on Gerrit console)
"jettyserver" is "com.google.gerrit.pgm.http.jetty.JettyServer@1fdac8a5"
"db" is "com.google.gerrit.reviewdb.server.ReviewDb_Schema_GwtOrm$$25@1c8aeedc"

Welcome to the Gerrit Inspector
Enter help() to see the above again, EOF to quit and stop Gerrit
Jython 2.5.2 (Release_2_5_2:7206, Mar 2 2011, 23:12:06)
[OpenJDK 64-Bit Server VM (Sun Microsystems Inc.)] on java1.6.0 running for Gerrit 2.4-rc0-78-g8ed6c15

for z in db.accounts().iterateAllEntities():

... print z.fullName
...
Marcin Cieślak
Marcin Cieslak (via gmail)

beau wrote:

Um... I am unable to log in to gerrit right now.

Application Error
Server Error
Cannot assign user name

Ok, everything should be squared away now. Usernames are now showing up properly[0], cover comments[1] and inline comments[2]. We also tested IRC--which works. E-mail notifs are working.

Only thing left to test is new user creation and login. Then we can mark this fixed.

[0] https://gerrit.wikimedia.org/r/#change,6008
[1] https://gerrit.wikimedia.org/r/#change,3962 (last comment)
[2] https://gerrit.wikimedia.org/r/#patch,sidebyside,3962,4,RELEASE-NOTES-1.20

beau wrote:

I can confirm logging in - works.

sumanah wrote:

I've now created a user account via https://labsconsole.wikimedia.org/wiki/Special:CreateAccount for Paweł Sadowski and am waiting for Paweł to confirm that login for Labs & Gerrit works.

I went ahead and made myself a testing account so I can use it in the future. It worked

https://gerrit.wikimedia.org/r/#dashboard,240

Marking this FIXED.

As of now, the IRC bot says:

Lastlog:
04:42 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)" [mediawiki/extensions/ProofreadPage] (master) C: 1; - https://gerrit.wikimedia.org/r/6345
04:49 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)" [mediawiki/extensions/ProofreadPage] (master) C: 1; - https://gerrit.wikimedia.org/r/6340
13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for horizontal layout to a preference." [mediawiki/extensions/ProofreadPage]

(master) - https://gerrit.wikimedia.org/r/6388

13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the proofreadpage_default_layout='horizontal' option doesn't work because of a change in the

html generated by wikieditor." [mediawiki/extensions/ProofreadPage] (master) - https://gerrit.wikimedia.org/r/6003

13:41 < gerrit-wm> New review: Szymon ?wierkosz; "Nothing changed between Patch Set 1 and Patch Set 2. It is one of my another failed attempts at usin..."

[mediawiki/extensions/ProofreadPage] (master) C: 1;  - https://gerrit.wikimedia.org/r/6003

20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for horizontal layout to a preference." [mediawiki/extensions/ProofreadPage]

(master) - https://gerrit.wikimedia.org/r/6388

20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the proofreadpage_default_layout='horizontal' option doesn't work because of a change in the

html generated by wikieditor." [mediawiki/extensions/ProofreadPage] (master) - https://gerrit.wikimedia.org/r/6003

13:08 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)" [mediawiki/core] (master) C: 0; - https://gerrit.wikimedia.org/r/6596

Fortunately, the HTML output seems fine - but something might have changed (is it because of 2.3)?

Can you have a look at 2.3 database again? Maybe it's just some interface to the IRC bot?

Did a simple test:

Added UTF-8 comment to:

https://gerrit.wikimedia.org/r/#/c/3289/

results:

$ ssh wikimedia gerrit stream-events
{"type":"comment-added","change":{"project":"test/mediawiki/core","branch":"master","topic":"master","id":"Icdc8f7e26c4cba920eda69a042702b8358797554","number":"3289","subject":"Testing git review...","owner":{"name":"IAlex","email":"ialex.wiki@gmail.com"},"url":"https://gerrit.wikimedia.org/r/3289"},"patchSet":{"number":"1","revision":"e5e3aafbce66df1b0a1094be7aa62c34a617c181","ref":"refs/changes/89/3289/1","uploader":{"name":"IAlex","email":"ialex.wiki@gmail.com"},"createdOn":1332230770},"author":{"name":"saper","email":"saper@saper.info"},"comment":"ąćęłńóśźć comment utf-8"}

But:

20:44 < gerrit-wm> New review: saper; "????????? comment utf-8" [test/mediawiki/core] (master) - https://gerrit.wikimedia.org/r/3289

(In reply to comment #27)

As of now, the IRC bot says:

Lastlog:
04:42 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)"
[mediawiki/extensions/ProofreadPage] (master) C: 1; -
https://gerrit.wikimedia.org/r/6345
04:49 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)"
[mediawiki/extensions/ProofreadPage] (master) C: 1; -
https://gerrit.wikimedia.org/r/6340
13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for
horizontal layout to a preference." [mediawiki/extensions/ProofreadPage]

(master) - https://gerrit.wikimedia.org/r/6388

13:39 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the
proofreadpage_default_layout='horizontal' option doesn't work because of a
change in the

html generated by wikieditor."

[mediawiki/extensions/ProofreadPage] (master) -
https://gerrit.wikimedia.org/r/6003
13:41 < gerrit-wm> New review: Szymon ?wierkosz; "Nothing changed between Patch
Set 1 and Patch Set 2. It is one of my another failed attempts at usin..."

[mediawiki/extensions/ProofreadPage] (master) C: 1;  -

https://gerrit.wikimedia.org/r/6003
20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Convert a JS variable for
horizontal layout to a preference." [mediawiki/extensions/ProofreadPage]

(master) - https://gerrit.wikimedia.org/r/6388

20:27 < gerrit-wm> New patchset: Szymon ?wierkosz; "Bug fixed : the
proofreadpage_default_layout='horizontal' option doesn't work because of a
change in the

html generated by wikieditor."

[mediawiki/extensions/ProofreadPage] (master) -
https://gerrit.wikimedia.org/r/6003
13:08 < gerrit-wm> New review: Szymon ?wierkosz; "(no comment)"
[mediawiki/core] (master) C: 0; - https://gerrit.wikimedia.org/r/6596

Fortunately, the HTML output seems fine - but something might have changed (is
it because of 2.3)?

Can you have a look at 2.3 database again? Maybe it's just some interface to
the IRC bot?

Could this be bug 36487?