Page MenuHomePhabricator

Evaluate the impact of changing innodb_change_buffering to inserts
Closed, ResolvedPublic

Description

Some of the crashes we've seen in 10.4 (including the almighty series of labsdb1011 crashes T249188) might be related to the fact that we have:

| innodb_change_buffering       | all   |

On the following MariaDB related bugs there is a mention that setting innodb_change_buffering = inserts or even to none might prevent this from happening again:
https://jira.mariadb.org/browse/MDEV-12463
https://jira.mariadb.org/browse/MDEV-22373 (filed by us)
https://jira.mariadb.org/browse/MDEV-22497

We should evaluate if changing this setting from all to inserts has some performance impact, and if not, we might need to switch it permanently.

So far the following hosts have it live changed:

  • db2129 s6 master
  • db2116 s1 slave

Lists of hosts as of 28th Sept 2020: T263443#6497328
More hosts added the 6th Oct on s6: db2087:3316 db2089:3316 db2076 db2097:3316 db2114
More hosts added the 6th Oct on s5: db2075 db2089:3315 db2099:3315 db2111 db2128
More hosts added the 7th Oct on pc3: pc2009

Event Timeline

Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2020-09-21T12:26:17Z] <marostegui> Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing T263443

Mentioned in SAL (#wikimedia-operations) [2020-09-21T12:26:49Z] <marostegui> Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing T263443

db2129 reverted to all
db2071 s1 slave, set to inserts

Mentioned in SAL (#wikimedia-operations) [2020-09-21T13:21:07Z] <marostegui> Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing T263443

Mentioned in SAL (#wikimedia-operations) [2020-09-21T14:21:35Z] <marostegui> Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing T263443

Mentioned in SAL (#wikimedia-operations) [2020-09-28T06:15:34Z] <marostegui> Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) T263443

Sum up of hosts with the setting changed to inserts:

s1:
db2071 db2085 db2116

s2:
db2108

s3:
db2109

s4:
db2106

s5:
db2075 db2089:3315 db2099:3315 db2111 db2128

s6:
db2087 db2089:3316 db2076 db2097:3316 db2114

s7:
db2087

s8:
db2085 db2081

pc3:
pc2009

Mentioned in SAL (#wikimedia-operations) [2020-10-06T07:53:13Z] <marostegui> Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 T263443

I haven't found anything weird on this so far, so I am going to deploy it to more hosts on s5 and s2 for now after deploying it to a bunch on s6 this morning.

Mentioned in SAL (#wikimedia-operations) [2020-10-06T13:04:36Z] <marostegui> Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 T263443

Mentioned in SAL (#wikimedia-operations) [2020-10-07T10:58:00Z] <marostegui> Set innodb_change_buffering = inserts on pc2009 T263443

I have noticed some slight increase on InnoDB wait time on pc2009, it could be just a coincidence but I am going to revert back to innodb_change_buffering = all and check what's the pattern there.

Mentioned in SAL (#wikimedia-operations) [2020-10-08T14:21:41Z] <marostegui> Set global innodb_change_buffering = all; on pc2009 T263443

I have noticed some slight increase on InnoDB wait time on pc2009, it could be just a coincidence but I am going to revert back to innodb_change_buffering = all and check what's the pattern there.

Going to leave that on pc2009 for the next few days, as the pattern seems to have come back to normal.
I will enable it back on Tuesday and see if it changes again:

Captura de pantalla 2020-10-09 a las 9.36.49.png (734×1 px, 118 KB)

This issues doesn't show on normal core hosts. pcXXX hosts have a very different write pattern from the rest of hosts, as they only receive REPLACE writes.

So far, the values haven't increased yet, so going to change it back on pc2009 to inserts to see if it goes back to higher values.

Mentioned in SAL (#wikimedia-operations) [2020-10-13T05:35:22Z] <marostegui> Set global innodb_change_buffering = inserts; on pc2009 T263443

This is running by default on all the clouddb hosts.

Change 668669 had a related patch set uploaded (by Marostegui; owner: Marostegui):
[operations/puppet@production] parsercache.my.cnf: innodb_change_buffering = none

https://gerrit.wikimedia.org/r/668669

Mentioned in SAL (#wikimedia-operations) [2021-03-05T11:28:21Z] <marostegui> Temporarily set innodb_change_buffering = none on db1134 (s1) - T263443

I have set: innodb_change_buffering = none on pc1007 and db1134 temporarily to check their performance.

Change 668669 merged by Marostegui:
[operations/puppet@production] parsercache.my.cnf: innodb_change_buffering = none

https://gerrit.wikimedia.org/r/668669

Mentioned in SAL (#wikimedia-operations) [2021-03-08T06:44:43Z] <marostegui> Set innodb_change_buffering = none on all parsercache hosts T263443

All parsercache hosts have been changed to innodb_change_buffering = none

Change 677423 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] misc,phabricator,dbstore_multiinstance.my.cnf: Set innodb_change_buffering = none

https://gerrit.wikimedia.org/r/677423

Change 677423 merged by Marostegui:

[operations/puppet@production] mariadb: Set innodb_change_buffering = none on a few roles

https://gerrit.wikimedia.org/r/677423

Changed this on a few roles:

  • Misc
  • Phabricator
  • dbstore_multiinstance

They'll pick up the change once mysql gets restarted

Change 679609 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] production.my.cnf: Add innodb_change_buffering = none

https://gerrit.wikimedia.org/r/679609

Change 679609 merged by Marostegui:

[operations/puppet@production] production.my.cnf: Add innodb_change_buffering = none

https://gerrit.wikimedia.org/r/679609

I have enabled this on production, it will be picked up once the hosts start to restart.
However, I am going to manually enable it on a few hosts per section so we can further test it even more.

This has been enabled everywhere

Change 684672 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] sanitarium_multiinstance.my.cnf: Add innodb_change_buffering

https://gerrit.wikimedia.org/r/684672

Change 684672 merged by Marostegui:

[operations/puppet@production] sanitarium_multiinstance.my.cnf: Add innodb_change_buffering

https://gerrit.wikimedia.org/r/684672

Change 688248 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] mariadb: Clarify that innodb_change_buffering is none

https://gerrit.wikimedia.org/r/688248

Change 688248 merged by Marostegui:

[operations/puppet@production] mariadb: Clarify that innodb_change_buffering is none

https://gerrit.wikimedia.org/r/688248