Page MenuHomePhabricator

Use case sensitive collation in wb_items_by_site
Open, LowPublic

Description

In the wb_items_by_site, the ips_page_title field is declared to be a VARCHAR. Per default, MySQL will apply case insensitive collation to fields with that type, removing the distinction between Foo, FOO and FoO. That distinction however is quite important, we might have distinct links to all of these.

Note that this doesn't happen when setting up mediawiki in "binary" database mode, since then VARCHAR gets changed to VARBINARY automatically. But this should still work correctly for people using utf-8 mode. So:

We can either declare this field to use binary collation, like we do for term_text in the wb_terms table: ips_page_title VARCHAR(255) BINARY NOT NULL. Or we could declare it to use case *sensitive* UTF-8 collation: ips_page_title VARCHAR(255) COLLATE utf8_unicode_520_ci NOT NULL.

However, it must be tested how well schema conversion works with these, for the different MySQL modes as well as for SQLite, PostGreSQL, etc

Details

Reference
bz47632

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:35 AM
bzimport set Reference to bz47632.
bzimport added a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).