Page MenuHomePhabricator

Error: 1071 Specified key was too long; max key length is 767 bytes
Closed, ResolvedPublic

Description

This is merely a support request. I am migrating MediaWiki CI jobs to a Debian Stretch container and a lot of them are failing when creating tables:

CREATE TABLE `abuse_filter_action` (
 afa_filter BIGINT unsigned NOT NULL,
 afa_consequence varchar(255) NOT NULL,  /* <----- */
 afa_parameters TINYBLOB NOT NULL,
 PRIMARY KEY (afa_filter,afa_consequence),
 KEY (afa_consequence)
 ) ENGINE=InnoDB

Function: Wikimedia\Rdbms\Database::sourceFile( /workspace/src/extensions/AbuseFilter/abusefilter.tables.sql )
Error: 1071 Specified key was too long; max key length is 767 bytes (localhost:/tmp/quibble-mysql-abvt891k/socket)
NOTE: VARCHAR(255) fields with utf8mb4 charset have a size of up to 1020 bytes. That exceed the 767 bytes limit for an index. The easy workaround is to limit the size of the key/index to 191 VARCHAR (764 bytes).

I had the issue with a fake test schema in mediawiki/core which I solved by lowering the varchar to 127 https://gerrit.wikimedia.org/r/#/c/419858/ . Given the schema is never used for a deployment, that is an acceptable fix/workaround. But for a schema meant to be deployed we can't just change the varchar size.

From the previous commit the explanation I came up with is:

MediaWikiTestCaseSchemaTest.sql comes with two varchar(255) combined into a PRIMARY KEY.
That caused the testsuite to fail creating that imagelinks table with:

Error: 1071 Specified key was too long; max key length is 767 bytes

That is the limit for innodb and when using UTF8 it allows up to 3 bytes per characters. Hence the key can be (255+255)*3 = 1530. One can tune MySQL to bump the limit to ~ 3k with: innodb_large_prefix = 1

The mystery is the same works fine on Nodepool disposable instances. So I must be missing a configuration tweak of some sort.

Environment

NodepoolDocker
DistroJessieStretch
MariaDB10.0.32-0+deb8u1MariaDB 10.1.26-0+deb9u1
character-set-serverlatin1utf8mb4
collation-serverlatin1_swedish_ciutf8mb4_general_ci
--help --verboseF17337439F17337441

docker-registry.wikimedia.org/releng/quibble-stretch:0.0.11

Related Objects

StatusSubtypeAssignedTask
Resolvedhashar
DeclinedNone
ResolvedNone
Invalidhashar
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DuplicateNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DeclinedNone
DuplicateNone
ResolvedLegoktm

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Change 430330 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Bump Quibble jobs to 0.0.11-2

https://gerrit.wikimedia.org/r/430330

Change 430330 merged by jenkins-bot:
[integration/config@master] Bump Quibble jobs to 0.0.11-2

https://gerrit.wikimedia.org/r/430330

@hashar The consensus seems to be going towards abandoning utf8/utf8mb4 (the mysql config) and embracing UTF-8 (the standard) through binary collation (the mysql config). If that materializes, as it appears it will be, https://gerrit.wikimedia.org/r/#/c/436608/ the right fix is to force binary charset/collation, as production and the installer will do and eventually only support, and not do any code changes. Basically:

character_set_server     = binary
character_set_filesystem = binary
collation_server         = binary

Change 438152 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[integration/config@master] Configure mariadb to use binary charset/collations

https://gerrit.wikimedia.org/r/438152

Change 438152 merged by jenkins-bot:
[integration/config@master] Configure mariadb to use binary charset/collations

https://gerrit.wikimedia.org/r/438152

Mainframe98 renamed this task from 13daaaaaaa to Error: 1071 Specified key was too long; max key length is 767 bytes.Jul 1 2018, 10:11 AM
Mainframe98 raised the priority of this task from High to Needs Triage.
Mainframe98 updated the task description. (Show Details)
Mainframe98 added a subscriber: gerritbot.

Change 429422 abandoned by Hashar:
Change primary key length to 191

https://gerrit.wikimedia.org/r/429422

hashar claimed this task.

I have filled this task when I have migrated the CI job for MediaWiki toward Docker. MariaDB came with the default configuration from the Debian packages:

DistributionJessieStretch
default-character-setLatin1utf8mb4
character_set_serverLatin1utf8mb4
character_set_filesystemLatin1utf8mb4
collation_serverLatin1utf8mb4

That got solved by providing a configuration file which set those settings to binary.

I don't think MediaWiki enforces anything, hence some extension might fail to install when MySQL uses `utf8mb4, I thus filled sub tasks for each of them.

T194125 is a RFC about potentially forcing the charset/collation.

Meanwhile, for CI that has been fixed by shipping the following configurations:

jessie/mariadb.cnf
[client]
# Jessie defaults to Latin1. T193222
default-character-set = binary

[mysqld]
# Jessie defaults to Latin1. T193222
character_set_server     = binary
character_set_filesystem = binary
collation_server         = binary
stretch/mariadb.cnf
[client]
# Stretch defaults to utf8mb4. T193222
default-character-set = binary

[mysqld]
# Stretch defaults to utf8mb4. T193222
character_set_server     = binary
character_set_filesystem = binary
collation_server         = binary

I don't think MediaWiki enforces anything, hence some extension might fail to install when MySQL uses `utf8mb4, I thus filled sub tasks for each of them.

The subtasks are lacking instructions how to fix, and this ticket doesn't really help. One says that installer now enforces correct settings, other person says it doesn't. Which one is right?

The issue was that MediaWiki did not define a default charset it thus used whatever the MySQL server has. On Debian Jessie it used to be Latin1, on Debian Stretch that got changed to utf8mb4.

Eventually that got addressed by https://gerrit.wikimedia.org/r/c/mediawiki/core/+/461279 which now defaults to a binary charset:

--- a/includes/DefaultSettings.php
+++ b/includes/DefaultSettings.php
@@ -1931,7 +1931,7 @@ $wgDBprefix = '';
 /**
  * MySQL table options to use during installation or update
  */
-$wgDBTableOptions = 'ENGINE=InnoDB';
+$wgDBTableOptions = 'ENGINE=InnoDB, DEFAULT CHARSET=binary';

Released with MediaWiki 1.32. I will close the spam of subtasks.