Page MenuHomePhabricator

Enabling automatic buffer pool dumping on start/stop (puppet) for all servers
Closed, ResolvedPublic

Description

It has already been deployed on pc100[123] servers:

innodb_buffer_pool_load_at_startup
innodb_buffer_pool_dump_at_shutdown

Details

Related Gerrit Patches:

Event Timeline

jcrespo created this task.Jun 1 2015, 5:58 PM
jcrespo claimed this task.
jcrespo raised the priority of this task from to Low.
jcrespo updated the task description. (Show Details)
jcrespo added projects: DBA, acl*sre-team.
jcrespo added subscribers: jcrespo, Springle.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 1 2015, 5:58 PM

Sounds good? What are the cons?

jcrespo renamed this task from Discuss enabling automatic buffer pool dumping on start/stop (puppet) for all servers to Enabling automatic buffer pool dumping on start/stop (puppet) for all servers.EditedJun 1 2015, 6:18 PM
jcrespo set Security to None.

Description updated :) There are downside, will discuss later.

jcrespo added a comment.EditedJun 2 2015, 9:13 AM

Good things:

  • Dump takes 1 second (it only writes the LRU list, not the actual pages) even for a 200GB buffer pool
  • Load is asynchronous by default, it doesn't block queries on start nor it makes the restart take longer
  • When activated, if combined with a full buffer pool on dump and replication, no further need for warming up the buffer- it will be done automatically within a period of time. This means that doing a restart, even by a non-DBA, is less dangerous, plus avoids mistakes for us.
  • It is enabled by default in 5.7, which for me indicates that it is a mature feature
  • I can be enabled and disabled in a hot way

Bad things:

  • The load process is a disk -> memory intensive read operation. I would not put the server into production while this happens (buffer pool not warm, lots of IO which may affect performance)
  • it takes a long time: Loading a 200GB buffer pool takes 50-90 minutes. In 5.7 they solve this issue by setting innodb_buffer_pool_dump_pct to 25% by default, which means that the dump progress is reduced to 1/4. This is not possible in MariaDB (and probably not desirable due to our data sizes).

I tested this on pc100[123] with no issue, plus on some production servers manually (_dump_now, _load_now). This is how to recognize the process on the error log (emphasis mine):

Dump:

150601 14:42:03 [Note] InnoDB: Starting shutdown...
**2015-06-01 14:42:03 7edd33ff8700 InnoDB: Dumping buffer pool(s) to .//ib_buffer_pool**
**2015-06-01 14:42:06 7edd33ff8700 InnoDB: Buffer pool(s) dump completed at 150601 14:42:06**
150601 14:42:18 [Note] InnoDB: Shutdown completed; log sequence number 79427498760587
150601 14:42:18 [Note] /opt/wmf-mariadb10/bin/mysqld: Shutdown complete

Load:

**2015-06-01 14:42:30 7f2bbfff8700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool**
150601 14:42:30 [Note] Server socket created on IP: '::'.
150601 14:42:30 [Note] Server socket created on IP: '::'.
150601 14:42:30 [Warning] 'user' entry 'root@pc2' ignored in --skip-name-resolve mode.
150601 14:42:30 [Warning] 'user' entry '@pc2' ignored in --skip-name-resolve mode.
150601 14:42:30 [Note] Event Scheduler: scheduler thread started with id 2
150601 14:42:30 [Note] /opt/wmf-mariadb10/bin/mysqld: ready for connections.
Version: '10.0.16-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  Source distribution
**2015-06-01 16:18:44 7f2bbfff8700 InnoDB: Buffer pool(s) load completed at 150601 16:18:44**

If the buffer does not load (for example, the LRU list does not exist), it logs it to the error log, but it is not fatal:

2015-05-28 10:58:36 7edd33ff8700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2015-05-28 10:58:36 7edd33ff8700 InnoDB: Cannot open './/ib_buffer_pool' for reading: No such file or directory
(boot sequence continues)

This doesn't solve the problem in case of a crash, but given that it would be a special case, I wouldn't mind. It could be done with an event, executing the dump from time to time- but in order to be effective it should be very regular, and it is know to have caused problems in the past (clarification: for the Facebook/Percona implementation, not 5.6) in terms of buffer pool global locks.

References:

Thoughts?

Change 215320 had a related patch set uploaded (by Jcrespo):
Enable buffer pool load at start and dump at stop

https://gerrit.wikimedia.org/r/215320

jcrespo moved this task from Triage to In progress on the DBA board.Jun 2 2015, 2:21 PM

Nice :-) I see it can be easily aborted too, which was my only (vague) concern.

Change 215320 merged by Jcrespo:
Enable buffer pool load at start and dump at stop

https://gerrit.wikimedia.org/r/215320