Page MenuHomePhabricator

MariaDB: Follow recommended memory suggestions
Closed, ResolvedPublic

Description

Following https://mariadb.com/kb/en/mariadb-memory-allocation

We should bump both primary and secondary RAM to 4GB and allocate 70% of that to the innodb_buffer_pool_size

Lets try this and see if it reduces our unusual problems

Event Timeline

As an initial attempt to understand how these things correlate we kicked over the secondary pod and will monitor the situation a bit before we do this change.

PR for staging: https://github.com/wmde/wbaas-deploy/pull/433

Note: This is probably not functional if we don't increase the staging cluster nodes before!

Could we somehow do it on local also?

dang removed dang as the assignee of this task.Jun 21 2022, 3:36 PM
dang subscribed.

There was quite some debate about if raising these values in this way makes sense.

For example would 2GB of total requested memory actually be sufficient? Also, what impact would tweaking this buffer have on the system?

A suggestion was made by @toan to consider tracking the numbers mentioned in https://mariadb.com/kb/en/innodb-buffer-pool/#innodb_buffer_pool_size. Specifically if, looking over time innodb_buffer_pool_reads changes less than 1% of the change in reads of innodb_buffer_pool_read_requests. If the outcome of T310697 looks promising then we could monitor these numbers relatively easily. I enabled the metrics sidecar locally by tweaking the chart values. These metrics are then clearly visible by connecting to the primary or secondary sql service on port 9104. For example:

# HELP mysql_global_status_innodb_buffer_pool_read_requests Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_buffer_pool_read_requests untyped
mysql_global_status_innodb_buffer_pool_read_requests 24211
# HELP mysql_global_status_innodb_buffer_pool_reads Generic metric from SHOW GLOBAL STATUS.
# TYPE mysql_global_status_innodb_buffer_pool_reads untyped
mysql_global_status_innodb_buffer_pool_reads 1223

In my opinion it would probably still make sense to go ahead and just merge this patch, as is, and then in a few weeks/months when the details of the performance of these buffers is more observable then we can think about tweaking this stuff up or down.

Seems to be happily deployed. It is notable that right now every change like this does "cause disruption" i.e. Wikis go down for a short period