**Introduce the following policies, recommendations, etc:**
* Code that touches the database must to be compatible with the following MySQL SQL modes:
- TRADITIONAL (equivalent to STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER)
- ONLY_FULL_GROUP_BY
* The mediawiki software must not enforce or make any assumption about the SQL mode used, and is and will be compatible with default and reasonable options from the first MySQL/MariaDB version supported to the latest stable releasenforce the SQL mode used (once the review period is complete)to the one mentioned above. ConnectExtensions must not attempt to change the SESSION value of sql_mode.
* Unsupported (old) versions of databases must be compatible as the mediawiki compatibility list indicates, but performance efforts should be concentrated on supported versions of databases and its default or widely recommended configuration parameters
* All tables must have a primary key. When a candidate for primary key could not be created (for example, if all columns can be repeated), an auto_increment or another arbitrary value, depending on the case, has to be added.
* Undeterministic queries and usafe statements for binlog should be avoided as they would return/write different results in a replication environment. The latter can be detected as warnings with the text " [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT". Those include "INSERT ... SELECT" when using and auto_increment key, "UPDATE ... LIMIT" without and ORDER BY, using undeterministic function like SYSDATE(). More info at: https://dev.mysql.com/doc/refman/5.6/en/replication-rbr-safe-unsafe.html
* To help understand which code is and isn't compatible with the above recomendations, Continous Integration checks must run mysql in sql_mode='TRADITIONAL,ONLY_FULL_GROUP_BY'. For the first yearno longer than 6 months, it should be "non-voting"
* In order to enforce the previous policies, new functionalities must be rejected if they are not compatible with the above stricter standards. A process will be created to review existing functionalities, with the goal of enforce them for all mediawiki-related code within 1 year6 months.
**Rationale**
MySQL in the past was a "toy" database, not scalable, non-transactional and very little constraints to input queries. While that allowed fast code development, it also allows very insecure practices. In the past 10 years, MySQL has evolved to be a scalable, fully ACID storage engine, and allows better enforcement of inputs, to the point that that strict mode will be the compiled defaults in the soon to be released MySQL 5.7 (and they where the defaults in configuration in 5.6).
How strict MySQL is with input is controlled by the "sql_mode" variable. I am not suggesting becoming incompatible with the loose old defaults, but making sure we are compatible with the stricter configuration, and, at the same time, make mediawiki code more secure (and WMF infrastucture, too, by switch to the stricter defaults). This is an example of the effects of this change:
```
MariaDB [test]> CREATE TABLE test (
id int NOT NULL PRIMARY KEY,
multivalue enum('wikipedia','wiktionary') NOT NULL,
date datetime NOT NULL,
lang_code CHAR(2) NOT NULL
);
Query OK, 0 rows affected (0.01 sec)
MariaDB [test]> INSERT INTO test (multivalue, date, lang_code) VALUES ('mediawiki', -1, 'this is a very long text');
Query OK, 1 row affected, 4 warnings (0.00 sec)
MariaDB [test]> SELECT * FROM test;
+----+------------+---------------------+-----------+
| id | multivalue | date | lang_code |
+----+------------+---------------------+-----------+
| 0 | | 0000-00-00 00:00:00 | th |
+----+------------+---------------------+-----------+
1 row in set (0.00 sec)
```
Full example: http://www.slideshare.net/jynus/query-optimization-with-mysql-57-and-mariadb-10-even-newer-tricks/41
Silent truncation without error has caused lots of bugs (some of them, security issues) in the few months I have been a DBA for mediawiki, specially due to mistakes on binary length vs character length. In code terms we have the equivalent to a "stack overflow".
If we enforce sql_mode traditional, however we have:
```
MariaDB [test]> SET sql_mode='TRADITIONAL';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> INSERT INTO test (multivalue, date, lang_code) VALUES ('mediawiki', -1, 'this is a very long text');
ERROR 1364 (HY000): Field 'id' doesn't have a default value
MariaDB [test]> INSERT INTO test (id, multivalue, date, lang_code) VALUES (1, 'mediawiki', -1, 'this is a very long text');
ERROR 1265 (01000): Data truncated for column 'multivalue' at row 1
MariaDB [test]> INSERT INTO test (id, multivalue, date, lang_code) VALUES (1, 'wikipedia', -1, 'this is a very long text');
ERROR 1292 (22007): Incorrect datetime value: '-1' for column 'date' at row 1
MariaDB [test]> INSERT INTO test (id, multivalue, date, lang_code) VALUES (1, 'wikipedia', now(), 'this is a very long text');
ERROR 1406 (22001): Data too long for column 'lang_code' at row 1
MariaDB [test]> INSERT INTO test (id, multivalue, date, lang_code) VALUES (1, 'wikipedia', now(), 'es');
Query OK, 1 row affected (0.00 sec)
```
Setting the mode in session will make us potentially incompatible (and it is a waste of a round trip to the database), so the idea is to set them in WMF databases configuration, while rejecting new code that is incompatible and reviewing existing code. Being compatible with stricter mode in SQL code is 100% compatible with looser modes.
A simplified, but larger explanation of TRADITIONAL is:
TRADITIONAL (equivalent to STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER, NO_ENGINE_SUBSTITUTION)
* STRICT_TRANS_TABLES: avoids invalid values (e.g truncations) for transactional tables (InnoDB)
* STRICT_ALL_TABLES: avoids invalid values (e.g truncations) for all tables, including MyISAM
* NO_ZERO_IN_DATE: avoids setting dates with day of the month 0, month 0, etc.
* NO_ZERO_DATE: disallows the '0000-00-00 00:00:00' value
* ERROR_FOR_DIVISION_BY_ZERO: error on division by 0 instead of returning null
* NO_AUTO_CREATE_USER: only affects mysql user creation (DBA task), disallows creating mysql users with GRANT, avoiding creating passwordless users
* NO_ENGINE_SUBSTITUTION: only affects table creation (DBA task), makes CRETE TABLE fail if the engine is unavailabe, instead of changing it to the default engine
Not enforcing this mode has created issues in the past, like the recent: https://phabricator.wikimedia.org/T99941
The above is regarding TRADITIONAL, so affecting insertions and updates. There is also a mode that affects SELECTs that are non-deterministic: creating GROUP BYs and selecting columns that we have not grouped by:
```
MariaDB [test]> CREATE TABLE test2 (id int PRIMARY KEY, name varchar(22), value int);
Query OK, 0 rows affected (0.02 sec)
MariaDB [test]> insert into test2 VALUES (1, 'tree', 45), (2, 'tree', 32), (3, 'rock', 12), (4, 'rock', 67);
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB [test]> SELECT name, value FROM test2 GROUP BY name;
+------+-------+
| name | value |
+------+-------+
| rock | 12 |
| tree | 45 |
+------+-------+
2 rows in set (0.00 sec)
```
What 'values' will we get? It is undetermined by SQL standard, not only that, even if engines tend to be deterministic (mysql usually sends the first value found), it could change from slave to slave, as it actually happened on https://phabricator.wikimedia.org/T102915#1538739
This is fixed very easily by doing:
```
MariaDB [test]> SET sql_mode='ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)
MariaDB [test]> SELECT name, value FROM test2 GROUP BY name;
ERROR 1055 (42000): 'test.test2.value' isn't in GROUP BY
```
Other undeterministic behaviours by unsafe SQL queries may actually compromise the data integrity of the slave, as producing lateral effects on slaves. While ROW-based replication will avoid that, there are several blockers that restric that for now.
Regarding primary keys, full rational is at https://phabricator.wikimedia.org/T17441#1420139 and the list of pending tables, just bellow that.
Primary keys are fundamental not only for consistency, but also for proper performance in MySQL's InnDB and row based replication. They are also very important for dba tools to work properly such as pt-table-checksum or pt-online-schema-change. Without them, we cannot check the integrity of the replicas and we cannot do schema changes online without a failover.
It is necessary to point out that InnoDB, MySQL's default storage engine since 5.5, and the one we use at the WMF, does always require a PRIMARY KEY to work internally, and if one it is not provided explicitly, and there are not other candidates (I think unique keys not null), a hidden 6-bytes identifier is created for them, but that is not user-accesible.
In summary, these change need investment, but will save in the long term hundreds of developer hours and it will allow mediawiki to scale better by using those standards.
**Who**
DBAs, with current policy, need to review all schema changes. If those are adopted, I can reject unsafe code, but **I need it in writing**. Of course, code review should be a team effort, and I will be happy to help.
**How**
Leaving this intentionally open, I proposed giving a 1 year period of detecting this on existing code, apply it immediately for new changes. Mediawiki core probably will need no change ("it works for me" under strict configuration), but lots of extensions do.