Error: 1071 Specified key was too long; max key length is 767 bytes
Open, Needs TriagePublic

Description

This is merely a support request. I am migrating MediaWiki CI jobs to a Debian Stretch container and a lot of them are failing when creating tables:

CREATE TABLE `abuse_filter_action` (
 afa_filter BIGINT unsigned NOT NULL,
 afa_consequence varchar(255) NOT NULL,  /* <----- */
 afa_parameters TINYBLOB NOT NULL,
 PRIMARY KEY (afa_filter,afa_consequence),
 KEY (afa_consequence)
 ) ENGINE=InnoDB

Function: Wikimedia\Rdbms\Database::sourceFile( /workspace/src/extensions/AbuseFilter/abusefilter.tables.sql )
Error: 1071 Specified key was too long; max key length is 767 bytes (localhost:/tmp/quibble-mysql-abvt891k/socket)
NOTE: VARCHAR(255) fields with utf8mb4 charset have a size of up to 1020 bytes. That exceed the 767 bytes limit for an index. The easy workaround is to limit the size of the key/index to 191 VARCHAR (764 bytes).

I had the issue with a fake test schema in mediawiki/core which I solved by lowering the varchar to 127 https://gerrit.wikimedia.org/r/#/c/419858/ . Given the schema is never used for a deployment, that is an acceptable fix/workaround. But for a schema meant to be deployed we can't just change the varchar size.

From the previous commit the explanation I came up with is:

MediaWikiTestCaseSchemaTest.sql comes with two varchar(255) combined into a PRIMARY KEY.
That caused the testsuite to fail creating that imagelinks table with:

Error: 1071 Specified key was too long; max key length is 767 bytes

That is the limit for innodb and when using UTF8 it allows up to 3 bytes per characters. Hence the key can be (255+255)*3 = 1530. One can tune MySQL to bump the limit to ~ 3k with: innodb_large_prefix = 1

The mystery is the same works fine on Nodepool disposable instances. So I must be missing a configuration tweak of some sort.

Environment

NodepoolDocker
DistroJessieStretch
MariaDB10.0.32-0+deb8u1MariaDB 10.1.26-0+deb9u1
character-set-serverlatin1utf8mb4
collation-serverlatin1_swedish_ciutf8mb4_general_ci
--help --verboseF17337439F17337441

docker-registry.wikimedia.org/releng/quibble-stretch:0.0.11

Related Objects

StatusAssignedTask
OpenNone
OpenNone
ResolvedNone
Invalidhashar
OpenNone
OpenNone
OpenNone
OpenNone
DuplicateNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
DeclinedNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedLegoktm
hashar created this task.Apr 27 2018, 8:24 AM
hashar updated the task description. (Show Details)Apr 27 2018, 8:30 AM

Diff of Jessie F17337439 versus Stretch F17337441 based on mysqld --help --verbose:

--- nodepool.txt	2018-04-27 10:28:05.640537847 +0200
+++ docker.txt	2018-04-27 10:24:35.402246794 +0200
@@ -5 +4,0 @@
-archive                                                    ON
@@ -9,0 +9 @@
+aria-encrypt-tables                                        FALSE
@@ -31 +31 @@
-back-log                                                   150
+back-log                                                   80
@@ -43,0 +44 @@
+binlog-row-image                                           FULL
@@ -45 +45,0 @@
-blackhole                                                  ON
@@ -46,0 +47 @@
+changed-page-bitmaps                                       ON
@@ -49 +50 @@
-character-set-server                                       latin1
+character-set-server                                       utf8mb4
@@ -52 +53 @@
-collation-server                                           latin1_swedish_ci
+collation-server                                           utf8mb4_general_ci
@@ -64 +65 @@
-debug                                                      disabled
+debug                                                      
@@ -74,0 +76 @@
+default-tmp-storage-engine                                 (No default value)
@@ -81,0 +84,4 @@
+encrypt-binlog                                             FALSE
+encrypt-tmp-disk-tables                                    FALSE
+encrypt-tmp-files                                          FALSE
+enforce-storage-engine                                     (No default value)
@@ -84,0 +91 @@
+explicit-defaults-for-timestamp                            FALSE
@@ -88 +94,0 @@
-federated                                                  ON
@@ -104 +110,2 @@
-general-log-file                                           ci-jessie-wikimedia-1042442.log
+general-log-file                                           9c3c9451e0a7.log
+getopt-prefix-matching                                     TRUE
@@ -112 +119 @@
-host-cache-size                                            128
+host-cache-size                                            279
@@ -132,0 +140,5 @@
+innodb-background-scrub-data-check-interval                3600
+innodb-background-scrub-data-compressed                    FALSE
+innodb-background-scrub-data-interval                      604800
+innodb-background-scrub-data-uncompressed                  FALSE
+innodb-buf-dump-status-frequency                           0
@@ -149 +161 @@
-innodb-checksum-algorithm                                  innodb
+innodb-checksum-algorithm                                  INNODB
@@ -151 +163 @@
-innodb-cleaner-lsn-age-factor                              high_checkpoint
+innodb-cleaner-lsn-age-factor                              HIGH_CHECKPOINT
@@ -159,0 +172 @@
+innodb-compression-algorithm                               zlib
@@ -166,0 +180,7 @@
+innodb-default-encryption-key-id                           1
+innodb-defragment                                          FALSE
+innodb-defragment-fill-factor                              0.9
+innodb-defragment-fill-factor-n-recs                       20
+innodb-defragment-frequency                                40
+innodb-defragment-n-pages                                  7
+innodb-defragment-stats-accuracy                           0
@@ -169 +189,6 @@
-innodb-empty-free-list-algorithm                           backoff
+innodb-empty-free-list-algorithm                           BACKOFF
+innodb-encrypt-log                                         FALSE
+innodb-encrypt-tables                                      OFF
+innodb-encryption-rotate-key-age                           1
+innodb-encryption-rotation-iops                            100
+innodb-encryption-threads                                  0
@@ -171,0 +197 @@
+innodb-fatal-semaphore-wait-threshold                      600
@@ -182,0 +209 @@
+innodb-force-primary-key                                   FALSE
@@ -184 +211 @@
-innodb-foreground-preflush                                 exponential_backoff
+innodb-foreground-preflush                                 EXPONENTIAL_BACKOFF
@@ -202,0 +230,3 @@
+innodb-idle-flush-pct                                      100
+innodb-immediate-scrub-data-uncompressed                   FALSE
+innodb-instrument-semaphores                               FALSE
@@ -206,0 +237 @@
+innodb-lock-schedule-algorithm                             fcfs
@@ -216,2 +247,2 @@
-innodb-log-buffer-size                                     8388608
-innodb-log-checksum-algorithm                              innodb
+innodb-log-buffer-size                                     16777216
+innodb-log-checksum-algorithm                              INNODB
@@ -234,0 +266,2 @@
+innodb-mtflush-threads                                     8
+innodb-mutexes                                             ON
@@ -240,0 +274 @@
+innodb-prefix-index-cluster-optimization                   FALSE
@@ -251,0 +286,2 @@
+innodb-scrub-log                                           FALSE
+innodb-scrub-log-speed                                     256
@@ -279,0 +316 @@
+innodb-sys-semaphore-waits                                 ON
@@ -283,0 +321,2 @@
+innodb-tablespaces-encryption                              ON
+innodb-tablespaces-scrubbing                               ON
@@ -294,0 +334 @@
+innodb-use-mtflush                                         FALSE
@@ -297,0 +338 @@
+innodb-use-trim                                            FALSE
@@ -300 +341 @@
-join-buffer-size                                           131072
+join-buffer-size                                           262144
@@ -317 +358 @@
-log-basename                                               ci-jessie-wikimedia-1042442
+log-basename                                               9c3c9451e0a7
@@ -357,0 +399 @@
+max-session-mem-used                                       9223372036854775807
@@ -359,0 +402 @@
+max-statement-time                                         0
@@ -376 +419 @@
-myisam-stats-method                                        nulls_unequal
+myisam-stats-method                                        NULLS_UNEQUAL
@@ -377,0 +421 @@
+mysql56-temporal-format                                    TRUE
@@ -387 +431 @@
-open-files-limit                                           65536
+open-files-limit                                           1048576
@@ -430 +474 @@
-performance-schema-max-statement-classes                   180
+performance-schema-max-statement-classes                   178
@@ -440 +484 @@
-plugin-dir                                                 /usr/lib/mysql/plugin/
+plugin-dir                                                 /usr/lib/x86_64-linux-gnu/mariadb18/plugin/
@@ -448 +492 @@
-query-alloc-block-size                                     8192
+query-alloc-block-size                                     16384
@@ -455 +499 @@
-query-prealloc-size                                        8192
+query-prealloc-size                                        24576
@@ -467 +511 @@
-replicate-events-marked-for-skip                           replicate
+replicate-events-marked-for-skip                           REPLICATE
@@ -474 +517,0 @@
-rpl-recovery-rank                                          0
@@ -476 +519 @@
-secure-auth                                                FALSE
+secure-auth                                                TRUE
@@ -477,0 +521 @@
+sequence                                                   ON
@@ -479,0 +524 @@
+silent-startup                                             FALSE
@@ -492,0 +538 @@
+slave-parallel-mode                                        conservative
@@ -493,0 +540 @@
+slave-run-triggers-for-rbr                                 NO
@@ -500 +547 @@
-slow-query-log-file                                        ci-jessie-wikimedia-1042442-slow.log
+slow-query-log-file                                        9c3c9451e0a7-slow.log
@@ -503 +550 @@
-sql-mode                                                   
+sql-mode                                                   NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
@@ -513,0 +561 @@
+strict-password-validation                                 TRUE
@@ -518,3 +566,3 @@
-sync-master-info                                           0
-sync-relay-log                                             0
-sync-relay-log-info                                        0
+sync-master-info                                           10000
+sync-relay-log                                             10000
+sync-relay-log-info                                        10000
@@ -523 +571 @@
-table-cache                                                400
+table-cache                                                2000
@@ -525 +573 @@
-table-open-cache                                           400
+table-open-cache                                           2000
@@ -532 +580 @@
-thread-pool-max-threads                                    500
+thread-pool-max-threads                                    1000
@@ -534 +582 @@
-thread-pool-size                                           2
+thread-pool-size                                           4
@@ -544,0 +593 @@
+unix-socket                                                ON
@@ -549,2 +598,2 @@
-version                                                    10.0.32-MariaDB-0+deb8u1
-version-comment                                            (Debian)
+version                                                    10.1.26-MariaDB-0+deb9u1
+version-comment                                            Debian 9.1
@@ -553 +602,2 @@
-version-malloc-library                                     bundled jemalloc
+version-malloc-library                                     system jemalloc
+version-ssl-library                                        YaSSL 2.4.2
@@ -554,0 +605,44 @@
+wsrep-OSU-method                                           TOI
+wsrep-auto-increment-control                               TRUE
+wsrep-causal-reads                                         FALSE
+wsrep-certify-nonPK                                        TRUE
+wsrep-cluster-address                                      
+wsrep-cluster-name                                         my_wsrep_cluster
+wsrep-convert-LOCK-to-trx                                  FALSE
+wsrep-data-home-dir                                        /var/lib/mysql/
+wsrep-dbug-option                                          
+wsrep-debug                                                FALSE
+wsrep-desync                                               FALSE
+wsrep-dirty-reads                                          FALSE
+wsrep-drupal-282555-workaround                             FALSE
+wsrep-forced-binlog-format                                 NONE
+wsrep-gtid-domain-id                                       0
+wsrep-gtid-mode                                            FALSE
+wsrep-load-data-splitting                                  TRUE
+wsrep-log-conflicts                                        FALSE
+wsrep-max-ws-rows                                          0
+wsrep-max-ws-size                                          2147483647
+wsrep-mysql-replication-bundle                             0
+wsrep-new-cluster                                          FALSE
+wsrep-node-address                                         
+wsrep-node-incoming-address                                AUTO
+wsrep-node-name                                            9c3c9451e0a7
+wsrep-notify-cmd                                           
+wsrep-on                                                   FALSE
+wsrep-patch-version                                        wsrep_25.19
+wsrep-provider                                             none
+wsrep-provider-options                                     
+wsrep-recover                                              FALSE
+wsrep-replicate-myisam                                     FALSE
+wsrep-restart-slave                                        FALSE
+wsrep-retry-autocommit                                     1
+wsrep-slave-FK-checks                                      TRUE
+wsrep-slave-UK-checks                                      FALSE
+wsrep-slave-threads                                        1
+wsrep-sst-auth                                             (No default value)
+wsrep-sst-donor                                            
+wsrep-sst-donor-rejects-queries                            FALSE
+wsrep-sst-method                                           rsync
+wsrep-sst-receive-address                                  AUTO
+wsrep-start-position                                       00000000-0000-0000-0000-000000000000:-1
+wsrep-sync-wait                                            0

That is similar to T179614. Jaime explained it as:

It is a migration from latin1 to utf8mb4 hence the maximum reacheable size is multiplied by 4. He pointed at a good summary at https://stackoverflow.com/questions/35847015/mysql-change-innodb-large-prefix/35851332#35851332

Additionally we should use a more modern row format. The big warning is tables have to be rebuild.


For the CI job, the database is created at the start of the job, there is no need to migrate any data / rebuild table. The settings to be applied are similar to the one for T179614:

innodb_file_format = Barracuda
innodb_large_prefix = 1

Change 429406 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: tweak quibble MariaDB settings

https://gerrit.wikimedia.org/r/429406

hashar claimed this task.Apr 27 2018, 12:05 PM

Change 429422 had a related patch set uploaded (by Hashar; owner: Hashar):
[mediawiki/extensions/BetaFeatures@master] Change primary key length to 191

https://gerrit.wikimedia.org/r/429422

@jcrespo took extra time and care to fully explain the situation to me over IRC. The summary of the discussion:

On Debian Stretch, MariaDB defaults to use the utf8mb4 character set. Hence a VARCHAR(1) can be up to 4 bytes. When creating a primary key or index on a VARCHAR(255) the index length is thus 4 * 255 = 1020 bytes. That exceeds InnoDB default key length of 767 bytes.

Setting innodb_large_prefix = 1 is the fix, but that also requires the tables to be created with ROW_FORMAT='DYNAMIC'.. But:

  • innodb_large_prefix is transient and will dropped in a future MySQL version. It only existed for back compatibility
  • The MediaWiki installer does not support passing table options to set the ROW_FORMAT

With MariaDB 10.2.2 and 10.1.32, innodb_default_row_format will let us set the format of CREATE TABLE to default to DYNAMIC without any additional code to be added in MediaWiki.

In order to support a default MariaDB configuration on Debian Stretch, the keys in MediaWiki extensions should be limited to 191 varchar (764 bytes), and can probably be made smaller in most cases. An example for BetaFeatures:

create table if not exists /*$wgDBprefix*/betafeatures_user_counts (
-  `feature`          varchar(255) primary key not null,
-  `number`           integer not null default 0
+  `feature`          varchar(255) not null,
+  `number`           integer not null default 0,
+  PRIMARY KEY (feature(191))   /* limited! :) */
) /*$wgDBTableOptions*/;
hashar removed a project: DBA.Apr 27 2018, 2:34 PM
hashar removed a subscriber: jcrespo.

Unsubscribing DBA and Jaime. He nicely provided support, the rest of the work is to adjust all the extensions.

Change 429426 had a related patch set uploaded (by Hashar; owner: Hashar):
[mediawiki/extensions/AbuseFilter@master] Change primary key length to 191

https://gerrit.wikimedia.org/r/429426

hashar updated the task description. (Show Details)Apr 27 2018, 3:00 PM

Change 429434 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] docker: MariaDB on Jessie to utf8mbr for Jessie

https://gerrit.wikimedia.org/r/429434

Change 429443 had a related patch set uploaded (by Hashar; owner: Hashar):
[mediawiki/extensions/AntiSpoof@master] Change primary key length to 191

https://gerrit.wikimedia.org/r/429443

There are too many extensions failing with utf8mb4 so I will make the Stretch MySQL to use latin1 like on Jessie.

hashar removed hashar as the assignee of this task.May 1 2018, 7:01 PM
hashar removed a project: Patch-For-Review.
Bawolff added a subscriber: Bawolff.May 1 2018, 7:27 PM

Note, that the installer will normally set either $wgDBTableOptions = 'ENGINE=InnoDB,DEFAULT CHARSET=binary'; or $wgDBTableOptions = 'ENGINE=InnoDB,DEFAULT CHARSET=utf8; depending on option selection (eww, i think we should kill utf8 from the installer but that's a separate issue),

so this issue should be impossible if you install mediawiki via the installer.

jcrespo added a comment.EditedMay 1 2018, 11:27 PM

I am not sure if creating so many tasks is the right way- in some cases the fields are overprovisioned, but in others, like video, they are tied to the 255 byte size defined on code for titles- and that should probably be supported. I think a better fix should be done on mediawiki database configuration installer rather than forcing meaningless schema changes (either by enforcing binary or large prefix index, or something else).

But even if I am wrong, and a schema change is prefered, I would suggest at least to freeze those bug reports until a decision is taken on which of the 2 ways to go CC MediaWiki-Platform-Team MediaWiki-Database.

Personally, I think we should just make everything varbinary(), or be more explicit about what fields are charset configurable.

Having random db fields potentially be any random charset seems like a recipe for bugs, with very little benefit. What's the actual use case for allowing fields to be other random charsets? Using collations for things other than categories? something else?

Can we close or stall the subtasks?- not necessarily as a final decision, but to not do and later undo work, as clearly this needs more discussion -and people are getting confused and asking questions.

I think we should just make everything varbinary()

I don't agree or disagree, we use binary for almost everything at WMF production and everything works ok (I think), but as a mostly mediawiki-only decision, I am going to guess there is a lot of utf8 (3 byte) installations out there that we have to support or migrate and (they may be already using 3 times the space?)?

Change 429434 merged by jenkins-bot:
[integration/config@master] docker: Force MariaDB to Latin1

https://gerrit.wikimedia.org/r/429434

Change 430330 had a related patch set uploaded (by Hashar; owner: Hashar):
[integration/config@master] Bump Quibble jobs to 0.0.11-2

https://gerrit.wikimedia.org/r/430330

Stashbot added a subscriber: Stashbot.

Mentioned in SAL (#wikimedia-releng) [2018-05-02T10:43:34Z] <hashar> Building quibble docker iamges 0.0.11-2 T118371 T193222

Change 430330 merged by jenkins-bot:
[integration/config@master] Bump Quibble jobs to 0.0.11-2

https://gerrit.wikimedia.org/r/430330

jcrespo added a comment.EditedMay 31 2018, 9:22 PM

@hashar The consensus seems to be going towards abandoning utf8/utf8mb4 (the mysql config) and embracing UTF-8 (the standard) through binary collation (the mysql config). If that materializes, as it appears it will be, https://gerrit.wikimedia.org/r/#/c/436608/ the right fix is to force binary charset/collation, as production and the installer will do and eventually only support, and not do any code changes. Basically:

character_set_server     = binary
character_set_filesystem = binary
collation_server         = binary

Change 438152 had a related patch set uploaded (by Legoktm; owner: Legoktm):
[integration/config@master] Configure mariadb to use binary charset/collations

https://gerrit.wikimedia.org/r/438152

Change 438152 merged by jenkins-bot:
[integration/config@master] Configure mariadb to use binary charset/collations

https://gerrit.wikimedia.org/r/438152

Mainframe98 renamed this task from 13daaaaaaa to Error: 1071 Specified key was too long; max key length is 767 bytes.Jul 1 2018, 10:11 AM
Mainframe98 raised the priority of this task from High to Needs Triage.
Mainframe98 updated the task description. (Show Details)
Mainframe98 added a subscriber: gerritbot.

Change 429422 abandoned by Hashar:
Change primary key length to 191

https://gerrit.wikimedia.org/r/429422