Page MenuHomePhabricator

Updating documentation to mention errors due to Django + MySQL + utf8mb4 index limitations/workarounds on ToolsDB
Open, Needs TriagePublic

Description

I've been using social_django for OAuth in my Outreachy Round-15 project WikiCV , and in GSoC project as well, as explained in My first Django OAuth Tool. Things were fine as long as I was using SQLite. But then I decided to move from SQLite to MySQL. If you're planning to do the same, it's not going to be easy. Why? Here's the answer.

Firstly during installation you'll be able to install mysqlclient but since libmysqlclient-dev is not installed in Toolforge when using Kubernetes, it won't work. For workaround for that refer to - this phabricator ticket and stackoverflow link.

Secondly, while migrating if you come across an error like this:
django.db.utils.InternalError: (1709, 'Index column size too large. The maximum column size is 767 bytes.')

Then this is because Toolforge is using 10.0.34 version of MySQL in which limitations are imposed on index columns. I encountered this error because I was using social_django app which had index columns with length > 767 bytes.
Solution for the this is:

  1. Go through this documentation which says that you need to tweak the lengths of some columns in order to make it work with databases like MySQL InnoDB. Set all the 4 variables mentioned here to 767/<<LENGTH_OF_DATA_TYPE_OF_INDEXED_COLUMN>> in the settings of your project (in settings.py file).
  2. Your .cnf file should look something like this file
  3. As I had already migrated with SQLite, changes done to the variables in social_django weren't getting reflected even on migration. So I had to delete the migrations folder in social_django and then migrate it again.

All of this took a lot of google-searching and help from my mentor.
I feel that if it was there in the documentation, it could have saved a lot of effort. I'm penning this down so that someone else's efforts could be saved.

Hope it would help.
Thanks,
Megha

Event Timeline

Indeed I ran into this when moving PAWS to toolsdb as well. The solution I used there was to set each row format as 'DYNAMIC'.
@bd808 has a good post on hacking that into django migrations

Another possible workaround is to use utf8 instead of utf8mb4, however that means partial Unicode support.

I have added a note to the documentation about these issues. But I hope they can be fixed (by upgrading mariadb) so that no workarounds are needed.

Your problem is not mariadb (10.0 is fully supported) your problem is trying to use utf8mb4- while mediawiki developers used to support utf8 (3-byte utf) they decided to support only binary for new installation. See: T194125

In other words- upgrading the server, while it is certainly going to happen soon, it will only make the problem worse, not better :-)

Alernatively, I can check if innodb_large_prefix is enabled, and you could rebuild your tables in DYNAMIC/COMPRESSED format (Barracuda).

Your problem is not mariadb (10.0 is fully supported) your problem is trying to use utf8mb4- while mediawiki developers used to support utf8 (3-byte utf) they decided to support only binary for new installation. See: T194125

In other words- upgrading the server, while it is certainly going to happen soon, it will only make the problem worse, not better :-)

Could you clarify in which way it will make the problem worse?

The linked task talks about the choice of charset for mediawiki, but how does that relate to python and django?

Is enabled, so it could be done like this:

root@labsdb1005.eqiad.wmnet[deleteme]> create table test (i varchar(300), UNIQUE KEY(i));
ERROR 1709 (HY000): Index column size too large. The maximum column size is 767 bytes.
root@labsdb1005.eqiad.wmnet[deleteme]> create table test (i varchar(300), UNIQUE KEY(i)) row_format=DYNAMIC;
Query OK, 0 rows affected (0.07 sec)

Sadly, MariaDB defaults the the older row format, and that cannot be changes until 10.2 or 10.2.

The linked task talks about the choice of charset for mediawiki

Sorry, I thought first this was about mediawiki compatibility- it isn't, but mediawiki has the same issues.

Aside from above, another workaround is force binary, as I said for mediawiki:

root@labsdb1005.eqiad.wmnet[(none)]> create database deleteme charset=binary;
Query OK, 1 row affected (0.08 sec)

root@labsdb1005.eqiad.wmnet[(none)]> use deleteme
Database changed
root@labsdb1005.eqiad.wmnet[deleteme]> create table test (i varchar(300), UNIQUE KEY(i));
Query OK, 0 rows affected (0.05 sec)

This is all running on toolsdb.

Could you clarify in which way it will make the problem worse?

Technically not worse, but utf8mb4 is now the default, so developers will have to be forced to support it, or it will fail everywhere by default (I just adjusted 10.0 to the newer version default). :-(

or it will fail everywhere by default (I just adjusted 10.0 to the newer version default). :-(

Do you mean specifically the indexes on utf8mb4 will fail under these conditions, until mariadb >= 10.2.2 is deployed? Or something else?

Do you mean specifically the indexes on utf8mb4 will fail under these conditions, until mariadb >= 10.2.2 is deployed? Or something else?

No, 10.2 will fail, too, because bad client configuration is being used. Both newer versions of MySQL and MariaDB use utf8mb4 as a default, and so will do toolsdb. Change the database default as I suggest at T198508#4447768 as a fix -this is not an infrastructure bug.

You said

10.0.34 version of MySQL in which limitations are imposed on index columns

But in reality is bad support from the python-social developers, it works if they decided to support properly utf8mb4 + innodb_large_prefix, which is now the default on newer versions.

Do you mean specifically the indexes on utf8mb4 will fail under these conditions, until mariadb >= 10.2.2 is deployed? Or something else?

No, 10.2 will fail, too, because bad client configuration is being used. Both newer versions of MySQL and MariaDB use utf8mb4 as a default, and so will do toolsdb. Change the database default as I suggest at T198508#4447768 as a fix -this is not an infrastructure bug.

You said

10.0.34 version of MySQL in which limitations are imposed on index columns

But in reality is bad support from the python-social developers, it works if they decided to support properly utf8mb4 + innodb_large_prefix, which is now the default on newer versions.

Hm, I believe there are no issues with either python-social or my own models that have unique CharField indexes on 10.2 - the indexes work fine with utf8mb4 without the need of any workarounds. What exactly do you mean by "bad client configuration" in this case?

bd808 renamed this task from Updating documentation to mention errors due to old MySQL version on Toolforge to Updating documentation to mention errors due to Django + MySQL + utf8mb4 index limitations/workarounds on ToolsDB.Jul 26 2018, 6:17 PM

MariaDB 10.2.2 makes ROW_FORMAT = DYNAMIC the system default which would magically make Django's default migrations work assuming that innodb_large_prefix is also enabled. MariaDB 10.3 apparently makes innodb_large_prefix the default as well. We will get newer versions of MariaDB/MySQL at some point, but that point is not soon. The ToolsDB servers share packages and configuration with the much larger pool of database servers in use on the Wikimedia Foundation networks. Making them be "special" rather than following the same testing and upgrade cycle as the other server clusters is not something that we have the staff to support.

I agree that getting Django apps to run on ToolsDB is challenging. I would also like this process to be more simple, and I have hope that in the future it will be. Today the best thing we can do is spread the word by updating documents and writing blog posts. Thank you @Yury_Bulka for the notes you have already added to the Toolforge help page.

Thank you, @bd808, for the clarification and explanation, and to @jcrespo for providing all the nitty-gritty details and more workarounds.

Definitely this discussion will be useful to anyone hitting the issue.

Is this all still true on Toolsdb? We are on version 10.1.39, which doesn't change a lot, I don't think.