Page MenuHomePhabricator

Enable MariaDB/MySQL's Strict Mode
Open, MediumPublic

Description

If you are looking for the full RFC, go to T112637

This task's scope is:

  • switching to MariaDB's Strict Mode, first on CI (T119371), then on production
  • Make it so that DB insert/update that would overflow would error out rather than be silently truncated
  • Generally promote better use of MariaDB by making many things fail precommit and/or fail loudly in production rather than failing quietly

Options discussed:

  • sql_mode = 'TRADITIONAL' (equivalent to STRICT_TRANS_TABLES, STRICT_ALL_TABLES, NO_ZERO_IN_DATE, NO_ZERO_DATE, ERROR_FOR_DIVISION_BY_ZERO, NO_AUTO_CREATE_USER)
  • sql_mode = ONLY_FULL_GROUP_BY

Often, our non-strict usage is a fountain of bugs and security issues.

Slide by @jcrespo that explains it: https://www.slideshare.net/jynus/query-optimization-with-mysql-57-and-mariadb-10-even-newer-tricks/43 and https://www.slideshare.net/jynus/query-optimization-with-mysql-57-and-mariadb-10-even-newer-tricks/44

Related Objects

View Standalone Graph
This task is connected to more than 200 other tasks. Only direct parents and subtasks are shown here. Use View Standalone Graph to show more of the graph.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Re: CI, doesn't our CI only run against sqlite? /cc @Legoktm

Our CI mostly runs against mysql, there are a few things that run mysql and sqlite, and there's a bug somewhere for adding sqlite tests as well.

@RobLa-WMF Yes, there are several (small) issues here. What I actually need more is to approve a policy change and enforce it for new functionalities. I know that reviewing old code is something that will not happen soon, but I want to start with enforcing it on new extensions, and at the same time that other schema changes are requested. That, together with the enforcement at CI side will mean that slowly, but steadily, we will be able to make it work on production.

Let me extend somewhere myself more about why all this expected standards are important- I believe that being database motivated, it may be difficult to understand, but If I translate to code terms, you will be saying "we should be doing this long time ago" :-)

"2. Enable MariaDB/MySQL's Strict Mode", the actual change, it is not something that needs to be discussed at architecture level- it is a change that I can do at any time, provided 1 has been done, and that can be tested at pure operations side (it is a mysql configuration parameter, not a PHP one).

jcrespo renamed this task from RfC: Enable MariaDB/MySQL's Strict Mode to Enable MariaDB/MySQL's Strict Mode.Sep 15 2015, 9:45 AM
jcrespo updated the task description. (Show Details)
jcrespo removed a project: TechCom-RFC.
hashar added a subscriber: hashar.Oct 20 2015, 9:36 AM

Adding Beta-Cluster-Infrastructure to the loop, since we might well want to enable the strictness there before hitting prod.

For CI we have a cherry picked patch on the puppetmaster to have it uses temps as a matador (else it is too slow). https://gerrit.wikimedia.org/r/#/c/204528/12/manifests/role/ci.pp,unified That uses Package['mysql-server']. We run jobs on either Precise or Trusty hosts:

CI PHP Zend jobs

They run on Precise instances (integration-slave-precise-XXXX.integration.eqiad.wmflabs) and have the stock mysql Ubuntu: 5.5.44-0ubuntu0.12.04.1 0

CI HHVM jobs

They run on Trusty instances (integration-slave-trusty-XXXX.integration.eqiad.wmflabs) and have the stock mysql Ubuntu as well: 5.5.44-0ubuntu0.14.04.1.

Beta cluster

There are two databases instances both running Precise with puppet role role::mariadb::beta. Installed via Package['mariadb-server-5.5'] which yields 5.5.34+maria-1~precise

So on CI we have not enforced back compatibility with 5.0.2 for ages and there is no guarantee MediaWiki and its extensions are still properly working. Even on 5.5, depending on the test coverage some edge cases might not be exercised. We probably want to switch CI to MariaDB if that makes sense.

As for strictness, the minute we turn strict mode on on CI, that might well causes some tests to fail and effectively block further development (Jenkins will vote -1 preventing merges even for unrelated patches). We will have to backport any fix to the various branches we maintain (REL1_23 to REL1_26 and the two wmf/ active branches). Not a big deal, it is usually just about cherry picking.

For Beta cluster, I guess it is all about enabling strictness and monitoring what is going on. Potentially via https://logstash-beta.wmflabs.org/

jcrespo added a comment.EditedOct 20 2015, 10:03 AM

This is not about 5.0 compatibility, if any, 5.7 compatibility. But that doesn't matter (we do not use any of these versions).

The change will enforce sql_mode on session level for all versions, and it should mostly work, at least for core without any changes. Extensions, however, may have different results. This is a code change, and can be enabled only on specific versions.

If starting with beta is easier, let's do that. But the idea is "break" mediawiki production as soon as possible (3 months).

Lets get it enabled on beta.

A good way to exercise beta is to pick some of the browser tests hitting it. They are all found on https://integration.wikimedia.org/ci/view/BrowserTests/view/-Dashboard/ and are named with the scheme browsertests-<REPO>-<DOMAIN>....

You can pick jobs from different REPO that target beta.wmflabs.org and are green (i.e. they are known to pass). Then:

  • get the change deployed on beta
  • log in Jenkins with your labs account
  • trigger all the green jobs

There might be unrelated failures though.

All / most of the MediaWiki logs are sent to http://logstash-beta.wmflabs.org/ for inspection.

I don't think the MySQL logs are sent anywhere so one has to dig on deployment-db1/db2 instances.

For production maybe we can enable strict mode on test.wikipedia.org?

Note: I am on leave till Monday. @zeljkofilipin can assist with triggering the browser tests in Jenkins.

hashar triaged this task as Medium priority.Nov 2 2015, 8:35 PM

Poked Jaime about it by email.

Strict mode is really useful, because it helped me to find a data loss issue T117854: Missing updater for rc_ip field: RecentChange::save: 1406 Data too long for column 'rc_ip' (the fact that it has not even been triaged yet annoys me).

But on the other hand, currently trying to do development with strict mode enabled is difficult. Just today when testing a patch I ran into T121489: Field 'ar_page' doesn't have a default value and T67807: Incorrect integer value: '' for column 'af_hidden'.

currently trying to do development with strict mode enabled is difficult

@Nikerabbit I assume you are working locally enabling strict mode, right? As far as I can tell, strict sql mode has not been enabled neither in production, not in beta, nor in CI tests yet. However, those are problems that other instances of mediawiki outside of the WMF could easily find, and we should prioritize fixing them. I think changing beta so that it is stricter than production would be a good way to debug them.

Correct. I am using strict mode locally to make sure we don't introduce any new strict mode errors and catch any such errors in extensions we maintain.

This is currently unassigned. Is that on purpose? I'm not sure if @hashar, @jcrespo , or someone else is the most logical assignee.

Anomie added a subscriber: Anomie.Jun 29 2016, 7:40 PM

One potential issue I see with ONLY_FULL_GROUP_BY is that the optimizer still sucks at dealing with it:

Without ONLY_FULL_GROUP_BY
mysql:wikiadmin@db1083 [enwiki]> explain SELECT page_namespace, page_title, page_id, page_is_redirect FROM page WHERE page_namespace=0 AND page_title>='@' GROUP BY page_title LIMIT 11;
+------+-------------+-------+------+---------------+------------+---------+-------+----------+------------------------------------+
| id   | select_type | table | type | possible_keys | key        | key_len | ref   | rows     | Extra                              |
+------+-------------+-------+------+---------------+------------+---------+-------+----------+------------------------------------+
|    1 | SIMPLE      | page  | ref  | name_title    | name_title | 4       | const | 20324890 | Using index condition; Using where |
+------+-------------+-------+------+---------------+------------+---------+-------+----------+------------------------------------+
1 row in set (0.00 sec)

mysql:wikiadmin@db1083 [enwiki]> SELECT page_namespace, page_title, page_id, page_is_redirect FROM page WHERE page_namespace=0 AND page_title>='@' GROUP BY page_title LIMIT 11;
+----------------+----------------------+----------+------------------+
| page_namespace | page_title           | page_id  | page_is_redirect |
+----------------+----------------------+----------+------------------+
|              0 | @                    |  2214518 |                1 |
|              0 | @!                   | 37696354 |                1 |
|              0 | @$$                  | 21089863 |                1 |
|              0 | @&*!_Smilers         | 20573031 |                1 |
|              0 | @-Bristol            | 23905089 |                1 |
|              0 | @-symbol             | 25176688 |                1 |
|              0 | @440                 | 18159966 |                1 |
|              0 | @ANCAlerts           | 47685646 |                0 |
|              0 | @Andheri             | 42866631 |                0 |
|              0 | @Andheri_(2014_film) | 42914733 |                1 |
|              0 | @BarackObama         | 35944823 |                1 |
+----------------+----------------------+----------+------------------+
11 rows in set (0.09 sec)

mysql:wikiadmin@db1083 [enwiki]> SHOW STATUS LIKE 'Hand%';
+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 2      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_external_lock      | 0      |
| Handler_icp_attempts       | 497545 |
| Handler_icp_match          | 16     |
| Handler_mrr_init           | 0      |
| Handler_mrr_key_refills    | 0      |
| Handler_mrr_rowid_refills  | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 0      |
| Handler_read_key           | 1      |
| Handler_read_last          | 0      |
| Handler_read_next          | 11     |
| Handler_read_prev          | 0      |
| Handler_read_rnd           | 0      |
| Handler_read_rnd_deleted   | 0      |
| Handler_read_rnd_next      | 168    |
| Handler_rollback           | 0      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_tmp_update         | 0      |
| Handler_tmp_write          | 166    |
| Handler_update             | 0      |
| Handler_write              | 0      |
+----------------------------+--------+
25 rows in set (0.00 sec)
With ONLY_FULL_GROUP_BY
mysql:wikiadmin@db1083 [enwiki]> SET SESSION sql_mode = 'TRADITIONAL,ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)

mysql:wikiadmin@db1083 [enwiki]> explain SELECT page_namespace, page_title, page_id, page_is_redirect FROM page WHERE page_namespace=0 AND page_title>='@' GROUP BY page_title LIMIT 11;
ERROR 1055 (42000): 'enwiki.page.page_namespace' isn't in GROUP BY
mysql:wikiadmin@db1083 [enwiki]> explain SELECT page_namespace, page_title, page_id, page_is_redirect FROM page WHERE page_namespace=0 AND page_title>='@' GROUP BY page_namespace, page_title, page_id, page_is_redirect LIMIT 11;
+------+-------------+-------+------+---------------+------------+---------+-------+----------+----------------------------------------------------+
| id   | select_type | table | type | possible_keys | key        | key_len | ref   | rows     | Extra                                              |
+------+-------------+-------+------+---------------+------------+---------+-------+----------+----------------------------------------------------+
|    1 | SIMPLE      | page  | ref  | name_title    | name_title | 4       | const | 20324891 | Using index condition; Using where; Using filesort |
+------+-------------+-------+------+---------------+------------+---------+-------+----------+----------------------------------------------------+
1 row in set (0.00 sec)

mysql:wikiadmin@db1083 [enwiki]> SELECT page_namespace, page_title, page_id, page_is_redirect FROM page WHERE page_namespace=0 AND page_title>='@' GROUP BY page_namespace, page_title, page_id, page_is_redirect LIMIT 11;
+----------------+----------------------+----------+------------------+
| page_namespace | page_title           | page_id  | page_is_redirect |
+----------------+----------------------+----------+------------------+
|              0 | @                    |  2214518 |                1 |
|              0 | @!                   | 37696354 |                1 |
|              0 | @$$                  | 21089863 |                1 |
|              0 | @&*!_Smilers         | 20573031 |                1 |
|              0 | @-Bristol            | 23905089 |                1 |
|              0 | @-symbol             | 25176688 |                1 |
|              0 | @440                 | 18159966 |                1 |
|              0 | @ANCAlerts           | 47685646 |                0 |
|              0 | @Andheri             | 42866631 |                0 |
|              0 | @Andheri_(2014_film) | 42914733 |                1 |
|              0 | @BarackObama         | 35944823 |                1 |
+----------------+----------------------+----------+------------------+
11 rows in set (32.67 sec)

mysql:wikiadmin@db1083 [enwiki]> SHOW STATUS LIKE 'Hand%';
+----------------------------+----------+
| Variable_name              | Value    |
+----------------------------+----------+
| Handler_commit             | 2        |
| Handler_delete             | 0        |
| Handler_discover           | 0        |
| Handler_external_lock      | 0        |
| Handler_icp_attempts       | 12637141 |
| Handler_icp_match          | 12139612 |
| Handler_mrr_init           | 0        |
| Handler_mrr_key_refills    | 0        |
| Handler_mrr_rowid_refills  | 0        |
| Handler_prepare            | 0        |
| Handler_read_first         | 0        |
| Handler_read_key           | 1        |
| Handler_read_last          | 0        |
| Handler_read_next          | 12139611 |
| Handler_read_prev          | 0        |
| Handler_read_rnd           | 0        |
| Handler_read_rnd_deleted   | 0        |
| Handler_read_rnd_next      | 168      |
| Handler_rollback           | 1        |
| Handler_savepoint          | 0        |
| Handler_savepoint_rollback | 0        |
| Handler_tmp_update         | 0        |
| Handler_tmp_write          | 166      |
| Handler_update             | 0        |
| Handler_write              | 0        |
+----------------------------+----------+
25 rows in set (0.00 sec)

That was a silly toy example and it still took a lot longer to return the 11 rows. The optimizer seems to fail in the same way if the GROUP BY contains any fields that aren't part of the index being used for grouping (or that are but are constant in the WHERE clause).

The particularly fun part about this example is that "WHERE page_namespace=0 GROUP BY page_title" already uniquely specifies a result row since name_title is a unique index.

I suppose we could always cheat...

mysql:wikiadmin@db1083 [enwiki]> SELECT 0 AS page_namespace, page_title, MAX(page_id) AS page_id, MAX(page_is_redirect) AS page_is_redirect FROM page WHERE page_namespace=0 AND page_title>='@' GROUP BY page_title LIMIT 11;
+----------------+----------------------+----------+------------------+
| page_namespace | page_title           | page_id  | page_is_redirect |
+----------------+----------------------+----------+------------------+
|              0 | @                    |  2214518 |                1 |
|              0 | @!                   | 37696354 |                1 |
|              0 | @$$                  | 21089863 |                1 |
|              0 | @&*!_Smilers         | 20573031 |                1 |
|              0 | @-Bristol            | 23905089 |                1 |
|              0 | @-symbol             | 25176688 |                1 |
|              0 | @440                 | 18159966 |                1 |
|              0 | @ANCAlerts           | 47685646 |                0 |
|              0 | @Andheri             | 42866631 |                0 |
|              0 | @Andheri_(2014_film) | 42914733 |                1 |
|              0 | @BarackObama         | 35944823 |                1 |
+----------------+----------------------+----------+------------------+
11 rows in set (0.09 sec)

This was fixed on 5.7: http://mysqlserverteam.com/mysql-5-7-only_full_group_by-improved-recognizing-functional-dependencies-enabled-by-default/ which given the other large issue we have with the optimizer we should seriously consider (I would have to check MariaDB 10.1).

My rationale is that 5.7, for external users, has this mode enabled by default, and we should be compatible with it. Enforcing it would be a second thought. But given the past experiences, I think it would fix more issues than it break things.

Let's give a chance to beta, which should be stricter than production, and not the other way round!

I have myself have no idea how to enable strict mode / which part of puppet config files have to be tweaked. The beta cluster database are also still running on Precise instance and MariaDB 5.5 , we might want to migrate them to migrate them to Jessie and MariaDB 5.10 first (that is T138778)

This is not blocked by that- it only needs to be enabled on config (see my comment on the child ticket).

Thanks @Huji for noting the SecurePoll strict mode problems (T147875).

I'll note this task still seems to be unassigned. @hashar, is the Beta cluster running in strict mode?

The beta cluster databases have been migrated to Jessie / MariaDB 5.10 T138778 and their config file is https://github.com/wikimedia/operations-puppet/blob/d3a05afd154/templates/mariadb/beta.my.cnf.erb

On beta I would guess it is about setting $wgSQLMode via operations/mediawiki-config , maybe that already has been done. For CI the task is T119371.

I am lurking on this task but reasonably have zero free time to lead the effort.

Krinkle removed a subscriber: Krinkle.Dec 2 2017, 12:57 AM
jcrespo updated the task description. (Show Details)Apr 27 2018, 9:04 AM

https://gerrit.wikimedia.org/r/#/c/429386/ would enable sql_mode = TRADITIONAL via includes/DevelopmentSettings.php which is loaded by CI. Tracked by sub task T119371.

hashar updated the task description. (Show Details)Apr 27 2018, 8:37 PM
Marostegui updated the task description. (Show Details)Jan 9 2019, 7:17 PM
Marostegui moved this task from Triage to Backlog on the DBA board.Jan 23 2019, 6:39 AM
Restricted Application added a project: Core Platform Team. · View Herald TranscriptJul 23 2019, 5:39 PM