Page MenuHomePhabricator

GettingStarted on Beta Cluster periodically loses its Redis index
Closed, ResolvedPublic

Description

Basically, (only on Beta) it loses the (meta)data in Redis, then the browser tests start failing because it can't show suggestions.

This is easily worked around by re-populating Redis, but it shouldn't be happening to begin with

Command to re-populate Redis:

foreachwikiindblist gettingstarted-with-category-suggestions.dblist extensions/GettingStarted/maintenance/populate_categories.php

This has happened at least twice now.

Event Timeline

Mattflaschen-WMF added a subscriber: hashar.

@hashar Do you know if Redis is persistent on Labs? We're using $sessionRedis from session-labs.php.

$sessionRedis from session-labs.php

Seems that redis server should only be for session storages isn't it ? I guess on prod you are using a dedicated one or another one. You probably want to align beta cluster config with the production one.

You can assistance to create a new instance on beta cluster and have it added to mediawiki-config via #wikimedia-releng :}

I guess on prod you are using a dedicated one or another one.

Nope, we're doing the same thing in prod and not having problems. In fact, I changed the Labs config to be more like prod in 35056c1c906505e30f4492a771882a54332f5402 .

Perhaps the Labs and prod Redis databases are configured differently and Labs is not persisting to disk, so it gets lost on restart. Or maybe neither are persistent, and prod just hasn't been restarted, but that seems less likely.

We could set up a dedicated Redis database in both prod and Labs, but that seems a bit much for a single feature. I'd prefer to keep it the same in prod for now.

@mattflaschen thanks, I assumed that on production you used a different redis or one dedicated to GettingStarted. Since beta and prod, that is fine.

Back to the original request: I have no idea whether redis is backed up on disk. Apparently it is not on the beta cluster and might not be either on production.

Matt can you reach out with someone that knows about redis system to do the verification? Not much we can do on our side.

It looks like beta (looking at deployment-redis01.eqiad.wmflabs) uses aof persistence using the appendfsync everysec directive.

Looks like at some point this instance must've used rdb persistence, but the current directive in /etc/redis/redis.conf is save ""

Here are the persisted files on disk:

thcipriani@deployment-redis01:~ 7m 26s
❯ ll /srv/redis/
total 66M
-rw-rw---- 1 redis redis  66M Jun 16 00:26 deployment-redis01-6379.aof
-rw-rw---- 1 redis redis 113K May 16 09:16 deployment-redis01-6379.rdb

indeed it looks like both beta redis are using aof persistence now, does still show up @mattflaschen ?

Mattflaschen-WMF claimed this task.

It hasn't recently, but I don't know that it's fixed either. I'll mark this closed tentatively and re-open if it happens again.

Seems like it did, unless it was a different cause.

Any news here? Still happening, still "high priority"?

Mattflaschen-WMF claimed this task.

It hasn't happened recently that I know of.

We can always reopen if it happens again...