Page MenuHomePhabricator

Evaluate the impact of changing innodb_change_buffering to inserts
Open, MediumPublic

Description

Some of the crashes we've seen in 10.4 (including the almighty series of labsdb1011 crashes T249188) might be related to the fact that we have:

| innodb_change_buffering       | all   |

On the following MariaDB related bugs there is a mention that setting innodb_change_buffering = inserts or even to none might prevent this from happening again:
https://jira.mariadb.org/browse/MDEV-12463
https://jira.mariadb.org/browse/MDEV-22373 (filed by us)
https://jira.mariadb.org/browse/MDEV-22497

We should evaluate if changing this setting from all to inserts has some performance impact, and if not, we might need to switch it permanently.

So far the following hosts have it live changed:

  • db2129 s6 master
  • db2116 s1 slave

Lists of hosts as of 28th Sept 2020: T263443#6497328
More hosts added the 6th Oct on s6: db2087:3316 db2089:3316 db2076 db2097:3316 db2114
More hosts added the 6th Oct on s5: db2075 db2089:3315 db2099:3315 db2111 db2128
More hosts added the 7th Oct on pc3: pc2009

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 21 2020, 11:53 AM
Marostegui triaged this task as Medium priority.Sep 21 2020, 11:53 AM
Marostegui moved this task from Triage to In progress on the DBA board.

Mentioned in SAL (#wikimedia-operations) [2020-09-21T12:26:17Z] <marostegui> Set innodb_change_buffering = all; on db2129 (s6 master) for performance testing T263443

Mentioned in SAL (#wikimedia-operations) [2020-09-21T12:26:49Z] <marostegui> Set innodb_change_buffering = all; on db2071 (s1 slave) for performance testing T263443

db2129 reverted to all
db2071 s1 slave, set to inserts

Mentioned in SAL (#wikimedia-operations) [2020-09-21T13:21:07Z] <marostegui> Set innodb_change_buffering = inserts; on db2081 (s8 slave) for performance testing T263443

db2081 (wikidata) set to inserts

Mentioned in SAL (#wikimedia-operations) [2020-09-21T14:21:35Z] <marostegui> Set innodb_change_buffering = inserts; on db2125 (s2 slave) for performance testing T263443

db2125 (s2) set to inserts

Mentioned in SAL (#wikimedia-operations) [2020-09-28T06:15:34Z] <marostegui> Set innodb_change_buffering = inserts; on db2089 (s5), db2106 (s4), db2108 (s2), db2085 (s1), db2085 (s8), db2087 (s7), db2087 (s6), db2109 (s3) T263443

Marostegui added a comment.EditedSep 28 2020, 6:18 AM

Sum up of hosts with the setting changed to inserts:

s1:
db2071 db2085 db2116

s2:
db2108

s3:
db2109

s4:
db2106

s5:
db2075 db2089:3315 db2099:3315 db2111 db2128

s6:
db2087 db2089:3316 db2076 db2097:3316 db2114

s7:
db2087

s8:
db2085 db2081

pc3:
pc2009

Marostegui updated the task description. (Show Details)Sep 28 2020, 6:21 AM

Mentioned in SAL (#wikimedia-operations) [2020-10-06T07:53:13Z] <marostegui> Change innodb_change_buffering = inserts on db2087:3316 db2089:3316 db2076 db2097:3316 db2114 T263443

Marostegui updated the task description. (Show Details)Oct 6 2020, 8:05 AM

I haven't found anything weird on this so far, so I am going to deploy it to more hosts on s5 and s2 for now after deploying it to a bunch on s6 this morning.

Mentioned in SAL (#wikimedia-operations) [2020-10-06T13:04:36Z] <marostegui> Change innodb_change_buffering = inserts on db2075 db2089 db2099 db2111 db2128 T263443

Marostegui updated the task description. (Show Details)Oct 6 2020, 1:05 PM

Mentioned in SAL (#wikimedia-operations) [2020-10-07T10:58:00Z] <marostegui> Set innodb_change_buffering = inserts on pc2009 T263443

Changed it on pc2009 too

Marostegui updated the task description. (Show Details)Oct 7 2020, 10:58 AM
LSobanski moved this task from Next to In Progress on the Data-Persistence board.
LSobanski moved this task from In progress to Ready on the DBA board.Oct 8 2020, 9:54 AM

I have noticed some slight increase on InnoDB wait time on pc2009, it could be just a coincidence but I am going to revert back to innodb_change_buffering = all and check what's the pattern there.

Mentioned in SAL (#wikimedia-operations) [2020-10-08T14:21:41Z] <marostegui> Set global innodb_change_buffering = all; on pc2009 T263443

I have noticed some slight increase on InnoDB wait time on pc2009, it could be just a coincidence but I am going to revert back to innodb_change_buffering = all and check what's the pattern there.

Going to leave that on pc2009 for the next few days, as the pattern seems to have come back to normal.
I will enable it back on Tuesday and see if it changes again:

This issues doesn't show on normal core hosts. pcXXX hosts have a very different write pattern from the rest of hosts, as they only receive REPLACE writes.

So far, the values haven't increased yet, so going to change it back on pc2009 to inserts to see if it goes back to higher values.

Mentioned in SAL (#wikimedia-operations) [2020-10-13T05:35:22Z] <marostegui> Set global innodb_change_buffering = inserts; on pc2009 T263443

LSobanski moved this task from Ready to In progress on the DBA board.Oct 26 2020, 2:46 PM
Marostegui moved this task from In progress to Ready on the DBA board.Nov 26 2020, 2:03 PM

This is running by default on all the clouddb hosts.