Page MenuHomePhabricator

Test MariaDB 10.11
Open, MediumPublic

Description

MariaDB 10.11 is the LTS with support until Feb 2028. It should be the replacement for 10.6

Event Timeline

Marostegui moved this task from Triage to Ready on the DBA board.

Tests will be on Bookworm

Created mariadb1011 VPS inside of the mariadbtest project.

Change #1036199 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/software@master] control-mariadb-10.11-bookworm: Initial packaging

https://gerrit.wikimedia.org/r/1036199

I've compiled and packaged 10.11 in bookworm, starting some basic tests in the testing env:

Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 21
Server version: 10.11.8-MariaDB-log MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

cumin2024@db1125.eqiad.wmnet[(none)]>

The server starts correctly but I am finding issues with the systemctl unit.

Change #1036199 merged by jenkins-bot:

[operations/software@master] control-mariadb-10.11-bookworm: Initial packaging

https://gerrit.wikimedia.org/r/1036199

I believe I have solved the unit issues. I am going to do some initial testing in the testing environment before moving into our infra.

Change #1036916 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] core_test.pp: Add MariaDB 10.11

https://gerrit.wikimedia.org/r/1036916

Change #1036916 merged by Marostegui:

[operations/puppet@production] core_test.pp: Add MariaDB 10.11

https://gerrit.wikimedia.org/r/1036916

Mentioned in SAL (#wikimedia-operations) [2024-06-11T08:31:52Z] <marostegui> Install 10.11 on db1153 (non used x2 replica) T365805

Installed 10.11 on db1153 (non used x2 replica) - will test here for a few weeks.

Change #1041528 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1153: Install MariaDB 10.11

https://gerrit.wikimedia.org/r/1041528

Change #1041528 merged by Marostegui:

[operations/puppet@production] db1153: Install MariaDB 10.11

https://gerrit.wikimedia.org/r/1041528

Change #1041529 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db1153: Add warning note

https://gerrit.wikimedia.org/r/1041529

Change #1041529 merged by Marostegui:

[operations/puppet@production] db1153: Add warning note

https://gerrit.wikimedia.org/r/1041529

After a couple of days, there's no noticiable differences in terms of replication performance in x2, between a host with 10.6 and a host with 10.11. The amount of inserts isn't huge in this section (around 150 writes per second).
This are for now the config differences:

root@cumin1002:~# sudo pt-config-diff --defaults-file /root/.my.cnf h=db1153.eqiad.wmnet h=db1151.eqiad.wmnet
24 config differences
Variable                  db1153                    db1151
========================= ========================= =========================
basedir                   /opt/wmf-mariadb1011      /opt/wmf-mariadb106
character_sets_dir        /opt/wmf-mariadb1011/s... /opt/wmf-mariadb106/sh...
explicit_defaults_for_... ON                        OFF
general_log_file          db1153.log                db1151.log
gtid_binlog_pos           171966470-171966470-14... 171966470-171966470-14...
gtid_binlog_state         171966470-171966470-14... 171966470-171966470-14...
gtid_current_pos          0-171970580-683331037,... 0-171970580-683331037,...
gtid_domain_id            171978800                 171966470
gtid_slave_pos            0-171970580-683331037,... 0-171970580-683331037,...
hostname                  db1153                    db1151
innodb_buffer_pool_chu... 6325010432                134217728
innodb_prefix_index_cl... ON                        OFF
log_bin_basename          /srv/sqldata/db1153-bin   /srv/sqldata/db1151-bin
log_bin_index             /srv/sqldata/db1153-bi... /srv/sqldata/db1151-bi...
optimizer_prune_level     2                         1
pid_file                  /srv/sqldata/db1153.pid   /srv/sqldata/db1151.pid
plugin_dir                /opt/wmf-mariadb1011/l... /opt/wmf-mariadb106/li...
report_host               db1153.eqiad.wmnet        db1151.eqiad.wmnet
server_id                 171978800                 171966470
slave_transaction_retr... 1158,1159,1160,1161,12... 1158,1159,1160,1161,12...
slow_query_log_file       db1153-slow.log           db1151-slow.log
version                   10.11.8-MariaDB-log       10.6.16-MariaDB-log
version_source_revision   3a069644682e336e445039... b83c379420a8846ae4b287...
wsrep_node_name           db1153                    db1151

As a big default difference is the optimize_prune_level which is set to 2. I am interested in seeing how that can affect reads (this will only be checked once we start replaying production reads on 10.11), for the record:

optimizer_prune_level¶
Description:Controls the heuristic(s) applied during query optimization to prune less-promising partial plans from the optimizer search space.
0: heuristics are disabled and an exhaustive search is performed
1: the optimizer will use heuristics to prune less-promising partial plans from the optimizer search space
2: tables using EQ_REF will be joined together as 'one entity' and the different combinations of these tables will not be considered (from MariaDB 10.10)

Default Value: 2 (>= MariaDB 10.10), 1 (<= MariaDB 10.9)

The next step is going to be testing just replication on s1.

Mentioned in SAL (#wikimedia-operations) [2024-06-26T06:31:10Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65446 and previous config saved to /var/cache/conftool/dbconfig/20240626-063109-root.json

Change #1049673 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2136: Migrate to MariaDB 10.11

https://gerrit.wikimedia.org/r/1049673

Change #1049673 merged by Marostegui:

[operations/puppet@production] db2136: Migrate to MariaDB 10.11

https://gerrit.wikimedia.org/r/1049673

Mentioned in SAL (#wikimedia-operations) [2024-06-26T06:39:26Z] <marostegui> Install mariadb 10.11 on s4 db2136 (depooled for now) T365805

Mentioned in SAL (#wikimedia-operations) [2024-06-26T06:52:29Z] <marostegui> Enable slow query log on db2136 running 10.11 T365805

Mentioned in SAL (#wikimedia-operations) [2024-06-26T06:56:37Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Pool db2136 - running 10.11 with minium weight T365805', diff saved to https://phabricator.wikimedia.org/P65447 and previous config saved to /var/cache/conftool/dbconfig/20240626-065636-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2024-06-26T08:55:11Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Depool db2136 T365805', diff saved to https://phabricator.wikimedia.org/P65463 and previous config saved to /var/cache/conftool/dbconfig/20240626-085511-root.json

I've installed 10.11 on db2136 (s4) for now. I've pooled in in production for a couple of hours to capture queries that would take longer than 10 seconds to run. For now the host is depooled again and will only be pooled during certain working hours.

Going to analyse what I got and then repool a bit more if possible if everything looks good.

Replication wise there seem to be no regressions for now, neither in x2 nor in s4, although s4 is early to tell.

Change #1049858 had a related patch set uploaded (by Marostegui; author: Marostegui):

[operations/puppet@production] db2136: Disable notifications

https://gerrit.wikimedia.org/r/1049858

Change #1049858 merged by Marostegui:

[operations/puppet@production] db2136: Disable notifications

https://gerrit.wikimedia.org/r/1049858

Mentioned in SAL (#wikimedia-operations) [2024-07-03T04:51:09Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json

Pooling db2136 with more weight

Mentioned in SAL (#wikimedia-operations) [2024-07-03T08:11:00Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json

Mentioned in SAL (#wikimedia-operations) [2024-07-03T09:19:57Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json

After a few days running in production, the host is back depooled and I am going to analyze the slow queries I've been able to capture.

Mentioned in SAL (#wikimedia-operations) [2024-07-08T12:43:11Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65939 and previous config saved to /var/cache/conftool/dbconfig/20240708-124310-marostegui.json

Repooled again with a bit of weight

Mentioned in SAL (#wikimedia-operations) [2024-07-10T11:50:47Z] <marostegui@cumin1002> dbctl commit (dc=all): 'Pool db2136 into api with small weight T365805', diff saved to https://phabricator.wikimedia.org/P66116 and previous config saved to /var/cache/conftool/dbconfig/20240710-115046-marostegui.json