Page MenuHomePhabricator

ores-redis-02 is out of disk space. AOF file is too big
Closed, ResolvedPublic

Description

halfak@ores-redis-02:~$ df -h 
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           799M   83M  717M  11% /run
/dev/vda3        19G   18G     0 100% /
tmpfs           4.0G     0  4.0G   0% /dev/shm
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           4.0G     0  4.0G   0% /sys/fs/cgroup
tmpfs           799M     0  799M   0% /run/user/0
tmpfs           799M     0  799M   0% /run/user/2041
halfak@ores-redis-02:/srv/redis$ ls -alh
total 16G
drwxr-xr-x 2 redis redis 4.0K Aug 29 22:28 .
drwxr-xr-x 3 root  root  4.0K Aug 31  2018 ..
-rw-r--r-- 1 redis redis  16G Sep 25 17:39 ores-redis-02-6380.aof

See also https://serverfault.com/questions/825883/redis-appendonly-aof-file-size-out-of-control

Event Timeline

It looks like we can fix this issue in redis config. Why is this only a problem now? Who knows.

Thanks to @Zppix for reporting. @ACraze, if this sounds exciting, you might take a look.

We probably need to make some changes here: https://github.com/wikimedia/puppet/blob/b347052863d4d2e87b37d6c2d9f44f833cfd9dc2/modules/role/manifests/labs/ores/redis.pp

We could set params for rewriting the aof file. E.g.,

auto-aof-rewritepercentage 100
auto-aof-rewrite-min-size 64mb

Alternatively, we could turn off the AOF file in labs entirely. That would mean that we would lose the cache whenever we restarted the redis service. That wouldn't be too bad honestly.

Looks like we already have:

auto-aof-rewrite-min-size 512mb

@akosiaris, do you have experience with trimming the aof files and limiting their size? I'm starting to realize that I'm getting in deep here and a big reason we chose a redis backend is "ops has experience with redis". So I thought maybe you would have an idea.

Right now, I think I'm going to just delete the AOF file, restart and hope we get another couple of years before this happens again.

@Halfak, this is pretty much blog post material. Anyway I 'll try and summarize what I think happened and what we can do below

Intro

ORES has a persistency mechanism called appendfile (AOF for short). It's one of the persistency mechanisms (the other one being a snapshot to disk, called RBD). It works by continuously appending to a file. Since that file would grow enormously large over time, it is often rewritten in order to keep the size short. Have a look at https://redislabs.com/ebook/part-2-core-concepts/chapter-4-keeping-data-safe-and-ensuring-performance/4-1-persistence-options/4-1-3-rewritingcompacting-append-only-files/ for a pretty ok writeup of the entire idea.

When does the AOF get rewritten?

From the docs

# This is how it works: Redis remembers the size of the AOF file after the
# latest rewrite (if no rewrite has happened since the restart, the size of
# the AOF at startup is used).
#
# This base size is compared to the current size. If the current size is
# bigger than the specified percentage, the rewrite is triggered. Also
# you need to specify a minimal size for the AOF file to be rewritten, this
# is useful to avoid rewriting the AOF file even if the percentage increase
# is reached but it is still pretty small.

Aka in mathematical form the trigger can be written as

size(n+1) >= size(n) AND size(n+1) >= auto-aof-rewrite-min-size

where n is a point in time and n+1 the next, with n=0 for when redis first starts.

Configured state

We already have auto-aof-rewrite-min-size 512mb as you pointed out, and the default for auto-aof-rewritepercentage is 100.

So AOF rewriting is happening periodically just fine keeping the filesize in check. This is evident from https://tools.wmflabs.org/nagf/?project=ores&range=month-day-year#h_ores-redis-02_cpu as well. Looking at the year graph for the free space it's quite constant (the slow decline can be attributed to things like logs, debian packages being left around, etc).

Timeline

In late June (exact date is difficult to discern and rather irrelevant) and up to Sept 14th something weird happens. There is a saw like pattern in the graph. It correlates perfectly with IO wait in the CPU graph of the host. There is increased IOwait starting on late June and then being exacerbated sometime mid August. This is probably due to contention on the WMCS host were ores-redis-02 lived. The effect was pronounced enough that there are many many lines in the redis log file starting on August 15th

615:M 05 Aug 15:12:30.092 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

These keep on happening till Sept 21st.

597:M 13 Sep 21:06:31.093 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

However the CPU IOwait graph seems to recover a week before that, on Sept 14th. Yet, redis continues to complain about the slow fsync during the AOF rewrite for 1 more week.

What happened?

While it was taking too long for the rewrite to happen (during the periods of the high IOwait), redis kept appending to the AOF, making it larger every time. By Sep 14th, when the IOwait is no longer present, the file is probably well over 7GB making the rewrite a challenge on it's own for the VM even without the IOwait issues. Redis keeps on logging the issue, it's just that the week from Sep 14th to Sep 21st the sheer size of the AOF is the issue, not the IOwait.

That being said, on Sep 21st, redis is restarted. By now it has amassed a pretty big AOF file, probably about 12GB. It loads the file (it takes almost 3 minutes to do so). It now remembers a 12GB AOF so per auto-aof-rewritepercentage being 100, it requires an AOF of 24GB before it attempts a rewrite.

For a couple of days there are no issues. Then on Sept 25, disk is full. redis logs:

595:M 25 Sep 00:30:11.087 # Short write while writing to the AOF file: (nwritten=213, expected=243)

Those correlate with the full disk space and are expected.

What we can do

@Halfak nailed it.

Alternatively, we could turn off the AOF file in labs entirely. That would mean that we would lose the cache whenever we restarted the redis service. That wouldn't be too bad honestly.

If we are even asking this question, we probably don't even need persistence and definitely not the level of persistence provided by AOF. The only question that remains is whether we even care about persisting the cache at all, at which point we could move to the RDB persistence mechanism, which per https://redis.io/topics/persistence does not suffer from this issues, although it does have more possibility for data loss and worse performance. My take is that since we are talking about a cache and we are even asking ourselves the question of whether we want persistence, even the RDB is an overkill. Let's just ditch persistence altogether and use redis as a in-memory cache.

As a side note, there is literally nothing that we can do to stop the aof file from gradually increasing in size whenever the host is under IO pressure (which is something we don't control as WMCS is a shared hosting environment)

@Halfak, I can prepare a patch, are you ok with the above?

Change 540912 had a related patch set uploaded (by Alexandros Kosiaris; owner: Alexandros Kosiaris):
[operations/puppet@production] ORES: Make redis AOF configurable

https://gerrit.wikimedia.org/r/540912

Change 540912 merged by Alexandros Kosiaris:
[operations/puppet@production] ORES: Make redis AOF configurable

https://gerrit.wikimedia.org/r/540912

Halfak claimed this task.
Halfak moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.