Page MenuHomePhabricator

Provide a way in our DBAL to convert a binary string to lowercase
Open, Needs TriagePublic

Description

There seems to be no method in our DBAL (specifically SQLPlatform) that allows you to apply LOWER to a binary string. This is actually a combination of two things:

  • There's no method for building a LOWER() call. This is a minor issue, because LOWER(X) works in all the DBMSs we support.
  • LOWER doesn't work on binary strings, as they must be converted/collated first. For instance, in MySQL you could do LOWER(CONVERT(my_bin_string using utf8mb4)) (and I'm sure there are other ways, each with their pros and cons; I'm not saying the implementation should be exactly this one). However, there's no method in SQLPLatform that allows you to do this in a cross-compatible fashion. There's a buildStringCast() method but that's not what I need here.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

The last LOWER was removed from core due to such issues and non-functional on binary types.

Sometimes it is better to store the value twice to search on one of them and the other is the display name. Even for index support that could be needed for that column.
The next problem for a filter is that LIKEs with % at begin not very good for the database, both parts are often used together.

The last LOWER was removed from core due to such issues and non-functional on binary types.

Hah, thanks, I didn't know that.

Sometimes it is better to store the value twice to search on one of them and the other is the display name. Even for index support that could be needed for that column.

That feels a bit redundant... It wouldn't help with performance though, because as you already noted, indexes cannot be used when there's a wildcard at the beginning of a LIKE pattern.

The next problem for a filter is that LIKEs with % at begin not very good for the database, both parts are often used together.

Is there a link to the code that generates the queries or example queries? How many rows are in the table? The linked tasks seems to suggest that the idea is to be able to search for a string anywhere in the event name or location name. The table is highly sortable too.

We usually store utf8 strings in binary columns, so LOWER would not be meaningful. One option would be to duplicate the event name/event in another column, stored as lowercase, e.g. during $dbw->insert(). Using LIKE is fine if "your events" is a limited of records (e.g. < 5K-10K, easily narrowed down from the whole tables though existing indexes). In that case, the LIKE portion of the WHERE would not need to make use of an index.

Is there a link to the code that generates the queries or example queries? How many rows are in the table? The linked tasks seems to suggest that the idea is to be able to search for a string anywhere in the event name or location name. The table is highly sortable too.

That is correct, and for the CampaignEvents use case we're still sorting out the details; the relevant code would be EventsPager, although I guess there could be more use cases elsewhere.

We usually store utf8 strings in binary columns, so LOWER would not be meaningful. One option would be to duplicate the event name/event in another column, stored as lowercase, e.g. during $dbw->insert().

Yeah, this was also proposed above, although I'd rather not do that if possible.

Using LIKE is fine if "your events" is a limited of records (e.g. < 5000K, easily narrowed down from the whole tables though existing indexes). In that case, the LIKE portion of the WHERE would not need to make use of an index.

In theory, nothing prevents people from creating a huge number of records. I can't provide realistic estimates though.