Page MenuHomePhabricator

Considering switching innodb_checksum_algorithm=full_crc32
Closed, ResolvedPublic

Description

As per MariaDB's recommendation on this bug report: https://jira.mariadb.org/browse/MDEV-23653 we might want to go from

+---------------------------+--------+
| Variable_name             | Value  |
+---------------------------+--------+
| innodb_checksum_algorithm | INNODB |
+---------------------------+--------+`

To innodb_checksum_algorithm=full_crc32 which will be default on 10.5+ anyways

From the doc:

The algorithm InnoDB uses for page checksumming. Possible values are FULL_CRC32 for new files, always use CRC-32C; for old, see CRC32 below; STRICT_FULL_CRC32 for new files, always use CRC-32C; for old, see STRICT_CRC32 below; CRC32 write crc32, allow any of the other checksums to match when reading; STRICT_CRC32 write crc32, do not allow other algorithms to match when reading; INNODB write a software calculated checksum, allow any other checksums to match when reading; STRICT_INNODB write a software calculated checksum, do not allow other algorithms to match when reading; NONE write a constant magic number, do not do any checksum verification when reading; STRICT_NONE write a constant magic number, do not allow values other than that magic number when reading; Files updated when this option is set to crc32 or strict_crc32 will not be readable by MariaDB versions older than 10.0.4; new files created with full_crc32 are readable by MariaDB 10.4.3+

From the bug report:

In 10.4, I would recommend

SET GLOBAL innodb_checksum_algorithm=full_crc32;

to get faster and more secure page checksums. The setting is the default starting with the 10.5 release. It only affects tables that have been created or rebuilt while the setting was in effect. Old-format files will be treated as crc32 (or if you use strict_full_crc32, as strict_crc32).

Roles done:

  • parsercache
  • misc_multiinstance (db1117, db2078 - stand by hosts)
  • misc masters
  • phabricator (both master and multi-instance replicas) T287244#7533901
  • analytics_multiinstance
  • dbstore_multiinstance T287244#7565443
  • Sanitarium hosts
  • db-inventory (db1115/db2093)
  • core_multiinstance
  • core

Event Timeline

Marostegui added a project: DBA.
Marostegui moved this task from Triage to Ready on the DBA board.

I am going to turn this ON for now on the new parsercache hosts on both eqiad and codfw, and see how they do (ie: no unexpected crashes and such). If that goes well, I will enable it on all the parsercache hosts (via puppet).

Mentioned in SAL (#wikimedia-operations) [2021-07-23T11:50:20Z] <marostegui> Change innodb_checksum_algorithm to full_crc32 on pc1011-1014 and pc2011-2014 - T287244

Finally I got some time to work on this!

Change 736709 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] parsercache.my.cnf: Add innodb_checksum_algorithm = full_crc32

https://gerrit.wikimedia.org/r/736709

Change 736709 merged by Marostegui:

[operations/puppet@production] parsercache.my.cnf: Add innodb_checksum_algorithm = full_crc32

https://gerrit.wikimedia.org/r/736709

I have committed this change on puppet only for parsercache hosts - this is a NOOP as the change was live since July.

Next I am going to explore this on an enwiki host.

Apart from db1124 and db1125, I have also changed this on db1133, which is a test host but that replicates from s1.
I have changed this only live, as I don't want to make it permanent yet - I have seen no issue with pc* hosts in terms of performance or stability, but it is too early to say. So let's expand this to those 3 hosts and see what happens in the next few days.

I have enabled full_cr32 on db1168 (s1 slave)

Change 742418 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] misc_multiinstance.my.cnf.erb: Change innodb_checksum_algorithm

https://gerrit.wikimedia.org/r/742418

Change 742418 merged by Marostegui:

[operations/puppet@production] misc_multiinstance.my.cnf.erb: Change innodb_checksum_algorithm

https://gerrit.wikimedia.org/r/742418

Applied this to db1117 and db2078 (along with an upgrade)

Change 742469 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] phabricator_instance.my.cnf.erb: Change innodb_checksum_algorithm

https://gerrit.wikimedia.org/r/742469

Change 742469 merged by Marostegui:

[operations/puppet@production] phabricator_instance.my.cnf.erb: Change innodb_checksum_algorithm

https://gerrit.wikimedia.org/r/742469

Change 742471 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] phabricator.my.cnf.erb: Change innodb_checksum_algorithm

https://gerrit.wikimedia.org/r/742471

Change 742471 merged by Marostegui:

[operations/puppet@production] phabricator.my.cnf.erb: Change innodb_checksum_algorithm

https://gerrit.wikimedia.org/r/742471

I have applied the change on puppet to the phabricator-related hosts but for now only live on the replicas and the codfw master.
Going to leave a few days before applying it live on the eqiad active master

  • db2078:3323
  • db1117:3323
  • db2134
  • db1107

Mentioned in SAL (#wikimedia-operations) [2021-12-07T06:14:08Z] <marostegui> Apply SET GLOBAL innodb_checksum_algorithm=full_crc32; on db1107 T287244

I have applied the change on puppet to the phabricator-related hosts but for now only live on the replicas and the codfw master.
Going to leave a few days before applying it live on the eqiad active master

  • db2078:3323
  • db1117:3323
  • db2134
  • db1107

Applied to db1107

Change 746654 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] analytics_multiinstance.my.cnf.erb: Enable full_crc32

https://gerrit.wikimedia.org/r/746654

I have added this to dbstore* my.cnf. So far though, I have changed it live on:

  • dbstore1003:3311
  • dbstore1005:3316
  • dbstore1007:3312

Change 746654 merged by Marostegui:

[operations/puppet@production] analytics_multiinstance.my.cnf.erb: Enable full_crc32

https://gerrit.wikimedia.org/r/746654

Change 746656 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] dbstore_multiinstance.my.cnf.erb: Add full_crc32

https://gerrit.wikimedia.org/r/746656

Change 746656 merged by Marostegui:

[operations/puppet@production] dbstore_multiinstance.my.cnf.erb: Add full_crc32

https://gerrit.wikimedia.org/r/746656

Mentioned in SAL (#wikimedia-operations) [2021-12-15T07:04:25Z] <marostegui> Enable full_crc32 on db2094 (s1, s3, s5 and s8) T287244

Enabled this on 1 sanitarium host in codfw: db2094 (s1, s3, s5 and s8)

Change 747413 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] sanitarium_multiinstance.my.cnf.erb: Enable innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/747413

Change 747413 merged by Marostegui:

[operations/puppet@production] sanitarium_multiinstance.my.cnf.erb: Enable innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/747413

I have added this to dbstore* my.cnf. So far though, I have changed it live on:

  • dbstore1003:3311
  • dbstore1005:3316
  • dbstore1007:3312

Deployed everywhere.

Enabled this on 1 sanitarium host in codfw: db2094 (s1, s3, s5 and s8)

Deployed on all db2094 and db2095

I will deploy this live to eqiad sanitarium after end of year holidays.

Mentioned in SAL (#wikimedia-operations) [2022-01-13T08:06:20Z] <marostegui> Change innodb_checksum_algorithm=full_crc32 on eqiad sanitarium hosts (db1154, db1155) T287244

Deployed on eqiad sanitarium hosts (db1154, db1155)

Change 753678 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db_inventory.my.cnf: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/753678

Change 753678 merged by Marostegui:

[operations/puppet@production] db_inventory.my.cnf: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/753678

Deployed this on db_inventory, which is db2093 and db1115. Unfortunately, this cannot be changed on 10.1, but it doesn't really matter as db1115 will be reimaged to 10.4 in less than a month for T297605: Shutdown Tendril and dbtree

Change 755915 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] core_multiinstance.my.cnf.erb: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/755915

Change 755915 merged by Marostegui:

[operations/puppet@production] core_multiinstance.my.cnf.erb: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/755915

Deployed this change on core_multiinstance hosts. It will be picked up as we restart mysql service.

Change 755932 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] misc.my.cnf.erb: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/755932

Change 755932 merged by Marostegui:

[operations/puppet@production] misc.my.cnf.erb: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/755932

Deployed on misc masters (misc-multi instance was done long ago).

Change 758420 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production.my.cnf: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/758420

Change 758420 merged by Marostegui:

[operations/puppet@production] production.my.cnf: innodb_checksum_algorithm=full_crc32

https://gerrit.wikimedia.org/r/758420

Marostegui updated the task description. (Show Details)

Deployed on core single instance hosts, which was the last role pending.