Page MenuHomePhabricator

Investigate RDB snapshot issue on ORES
Closed, ResolvedPublic

Description

I got this error for every request this morning.

MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error.

It looks like the issue happens when we're saving to the cache. I checked the rdb files on ores-redis-01 and found that the cache rdb file was exactly 1000MB, but the config put the max memory at 3GB. I restarted redis and the service recovered. The 1000MB rdb file grew to 1001MB in a minute and then the same error started happening.

So I ran config set stop-writes-on-bgsave-error no in the redis-cli for 6380. That seems to have recovered ORES, but it means we're probably not persisting to disk anymore.

Event Timeline

Halfak created this task.Dec 30 2015, 6:29 PM
Halfak updated the task description. (Show Details)
Halfak raised the priority of this task from to Unbreak Now!.
Halfak assigned this task to yuvipanda.
Halfak moved this task to Active on the Scoring-platform-team (Current) board.
Halfak added a subscriber: Halfak.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 30 2015, 6:29 PM

Just checked the logs I'm seeing

[11924] 30 Dec 18:33:15.078 * 1 changes in 900 seconds. Saving...
[11924] 30 Dec 18:33:15.078 # Can't save in background: fork: Cannot allocate memory
[11924] 30 Dec 18:33:21.098 * 1 changes in 900 seconds. Saving...
[11924] 30 Dec 18:33:21.098 # Can't save in background: fork: Cannot allocate memory
[11924] 30 Dec 18:33:27.026 * 1 changes in 900 seconds. Saving...
[11924] 30 Dec 18:33:27.026 # Can't save in background: fork: Cannot allocate memory
[11924] 30 Dec 18:33:33.055 * 1 changes in 900 seconds. Saving...
[11924] 30 Dec 18:33:33.055 # Can't save in background: fork: Cannot allocate memory
[11924] 30 Dec 18:33:39.087 * 1 changes in 900 seconds. Saving...
[11924] 30 Dec 18:33:39.088 # Can't save in background: fork: Cannot allocate memory

So it looks like it might be a memory issue. Maybe we should cut the maxmemory for the cache server.

Halfak added a comment.EditedJan 8 2016, 6:07 PM

@yuvipanda, where are we on this? Didn't you get another changeset merged and the problem was resolved. I seem to remember doing some manual restarts of the uwsgi processes so that we could reboot redis.

yuvipanda closed this task as Resolved.Jan 13 2016, 6:08 PM

Yup, I enabled memory overcommit for all redises and it's all good. Was good we caught it here, would've hit other redises later...

@yuvipanda, great! Thanks. Please help us manage our progress report and move tasks to the "Done" column before resolving. I'll move this one.

Halfak reopened this task as Open.Jan 13 2016, 7:25 PM
Halfak set Security to None.
Halfak moved this task from Backlog to Done on the Scoring-platform-team (Current) board.
Halfak closed this task as Resolved.

What about https://phabricator.wikimedia.org/T122666 since that is still open but the ticket here is closed.

They are the same ticket no?

They are the same ticket no?

Sorry, wrong paste. I meant to ask what about https://gerrit.wikimedia.org/r/#/c/261642/ since that links over here to this ticket and was still open. I found that when looking in Gerrit for open changes to the ops/puppet repo.

Looks like that can be abandoned. I'll do that now.