Page MenuHomePhabricator

Reduce memory commitment on database hosts with many objects, specially s3, dbstore/research and labs
Closed, ResolvedPublic

Description

In the period I have been working here, I have seen certain very specific hosts having either swapping issues or OOMs. In most cases, they are not due to bad database configuration, but to over-committing resources on edge cases (see T107072). Those edge cases are not related to how "busy" the server is (enwiki has not those problems), but probably to one of several of this issues:

  • Mixing more than one engine, so MyISAM, InnoDB and TokuDB compete for resources
  • Unusual workloads (UPDATES/ALTERs by software deployments on a large number of tables, reboots, horrible queries long queries with no understanding of underlying hardware on non-core servers)
  • Too many objects, which require more OS-controlled memory (buffers). This has probably grown recently due to thousands of new objects being created in the the default shard

While there could be better hardware coming for the old production and labs db hosts, I think we should check this.

Proposal:

  • Analyze past OOMs and swap usage in the past globally, specially the s3 shard, research and dbstore hosts and labsdbhosts
  • Reduce the InnoDB buffer pool (s3) or analyze a better balance between that and TokuDB, MyISAM, Aria resources (labs, dbstore) and OS buffering (all)

Pros

  • If it works, better throughput, less performance-drop spikes
  • Potential speed performance for buffer-related tasks (binlogs-replication-lag, etc.)
  • More reliability

Cons

  • Worse performance latency, maybe thoughput, due to less memory available to InnoDB/MySQL

Alternatives

  • Just wait for better hardware
  • Reshard s3 or create another default shard (this does not work for labs/dbstore)

Details

Related Gerrit Patches:

Event Timeline

jcrespo created this task.Jul 29 2015, 1:08 PM
jcrespo raised the priority of this task from to Needs Triage.
jcrespo updated the task description. (Show Details)
jcrespo added projects: DBA, acl*sre-team.
jcrespo added subscribers: jcrespo, Springle, yuvipanda, coren.
Restricted Application added subscribers: Matanya, Aklapper. · View Herald TranscriptJul 29 2015, 1:08 PM
jcrespo renamed this task from Reduce memory commitment on database hosts with many objects, specially s3 and labs to Reduce memory commitment on database hosts with many objects, specially s3, dbstore/research and labs.Jul 29 2015, 1:08 PM
jcrespo set Security to None.

This is related to T107070, but a short term change with existing hardware.

Buffer pool sizes:

analytics.my.cnf.erb:innodb_buffer_pool_size        = 4G
beta.my.cnf.erb:innodb_buffer_pool_size = <%= (Float(@memorysize.split[0]) * 0.75).round %>G
dbstore.my.cnf.erb:innodb_buffer_pool_size        = 32G
default.my.cnf.erb:innodb_buffer_pool_size = <%= (Float(@memorysize.split[0]) * 0.75).round %>G
eventlogging.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.25).round %>G
labs.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.25).round %>G
misc.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.75).round %>G
parsercache.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.75).round %>G
phabricator.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.5).round %>G
phabricator.my.cnf.erb~:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.5).round %>G
production.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.75).round %>G
sanitarium.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.05).round %>G
tendril.my.cnf.erb:innodb_buffer_pool_size        = 4G
toolsmaster.my.cnf.erb:innodb_buffer_pool_size        = <%= (Float(@memorysize.split[0]) * 0.5).round %>G
wikitech.my.cnf.erb:innodb_buffer_pool_size        = 4G

TokuDB cache sizes:

analytics.my.cnf.erb:tokudb_cache_size = <%= (Float(@memorysize.split[0]) * 0.50).round %>G
dbstore.my.cnf.erb:tokudb_cache_size = 32G
eventlogging.my.cnf.erb:tokudb_cache_size              = <%= (Float(@memorysize.split[0]) * 0.5).round %>G
labs.my.cnf.erb:tokudb_cache_size              = <%= (Float(@memorysize.split[0]) * 0.50).round %>G
sanitarium.my.cnf.erb:tokudb_cache_size              = <%= (Float(@memorysize.split[0]) * 0.05).round %>G
tendril.my.cnf.erb:tokudb_cache_size              = 32G

Aria pagecache:

analytics.my.cnf.erb:aria_pagecache_buffer_size = 1G
dbstore.my.cnf.erb:aria_pagecache_buffer_size     = 16G
eventlogging.my.cnf.erb:aria_pagecache_buffer_size     = 512M
labs.my.cnf.erb:aria_pagecache_buffer_size     = <%= (Float(@memorysize.split[0]) * 0.05).round %>G
misc.my.cnf.erb:aria_pagecache_buffer_size     = 512M
parsercache.my.cnf.erb:aria_pagecache_buffer_size     = 1G
phabricator.my.cnf.erb:aria_pagecache_buffer_size     = 4G
phabricator.my.cnf.erb~:aria_pagecache_buffer_size     = 4G
production.my.cnf.erb:aria_pagecache_buffer_size     = 1G
tendril.my.cnf.erb:aria_pagecache_buffer_size     = 1G
wikitech.my.cnf.erb:aria_pagecache_buffer_size     = 128M

Key_buffer_size is not set, even if MyISAM (not aria) is heavily used on some tools-user-created dbs. Other parameters to consider: thread_cache, open_table_cache.

Just to clarify, I don't think we've seen actual OOM killer on s[1-7], right? The only front-line production concern is swapping on s3 db1035?

Based on that (please correct me if not), my $0.02:

  • Reducing production InnoDB buffer pool size on s[1-7], including s3, is not attractive idea, I think. Instead, push ahead with EQIAD upgrades to 160GB slaves, where 75%/25% split means more headroom for edge cases anyway.
  • Full +1 support for reviewing the memory split and footprint on non-s[1-7] boxes with multiple engines in use on dbstore, misc, labs, tendril, etc.
  • Full +1 support for splitting s3. In the short term, perhaps we could temporarily innodb_file_per_table=0 and move 90% of small tables into the central tablespace? This used to be the case, actually, but has not been done for some time.

Regarding s3, I am unsure if we would get any significative advantage reducing physical files, but I do not disagree with doing that. And do you think I disagree with having more memory on better hardware? :-)

s[1-7]: there are some research boxes that concern me, too.

akosiaris assigned this task to jcrespo.Aug 25 2015, 1:04 PM
akosiaris triaged this task as Medium priority.
akosiaris added a subscriber: akosiaris.
jcrespo lowered the priority of this task from Medium to Low.Aug 25 2015, 1:08 PM

Low: springle disagrees (with a proper reason). S3 issues are mostly gone, and probably new hardware will arrive before doing something.

I've reduced, but not puppetized, because I am not 100% sure it will be ok, both innodb buffer pool and tokudb cache, because there has been several issues with that machine starting swapping in the past (from 32G each, to 24G, leaving >100GB for filesystem cache (which tokudb uses extensively) and other per-connection usage.

Change 278238 had a related patch set uploaded (by Jcrespo):
Reduce memory usage by TokuDB and InnoDB main buffers

https://gerrit.wikimedia.org/r/278238

Change 278238 merged by Jcrespo:
Reduce memory usage by TokuDB and InnoDB main buffers

https://gerrit.wikimedia.org/r/278238

jcrespo closed this task as Resolved.Feb 14 2017, 4:02 PM

This was done on dbstore2 manifest. We have not seen reasons to do it on the others. We will see if we have to reduce it for the new labs.