GettingStarted on Beta Cluster periodically loses its Redis index
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• Mattflaschen-WMF
	May 27 2015, 2:21 PM

Description

Basically, (only on Beta) it loses the (meta)data in Redis, then the browser tests start failing because it can't show suggestions.

This is easily worked around by re-populating Redis, but it shouldn't be happening to begin with

Command to re-populate Redis:

foreachwikiindblist gettingstarted-with-category-suggestions.dblist extensions/GettingStarted/maintenance/populate_categories.php

This has happened at least twice now.

Related Objects

Mentioned In: T99655: Upgrade GettingStarted browser tests to use mediawiki_selenium 1.x
T94154: Delete or fix failed GettingStarted browsertests Jenkins job
Mentioned Here: T94154: Delete or fix failed GettingStarted browsertests Jenkins job
rOMWC35056c1c9065: Have production and Labs Redis sessions use same structure.

Event Timeline

• Mattflaschen-WMF created this task.May 27 2015, 2:21 PM

• Mattflaschen-WMF raised the priority of this task from to Needs Triage.

• Mattflaschen-WMF updated the task description. (Show Details)

• Mattflaschen-WMF added projects: MediaWiki-extensions-GettingStarted, Beta-Cluster-Infrastructure.

• Mattflaschen-WMF subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 27 2015, 2:21 PM

@hashar Do you know if Redis is persistent on Labs? We're using $sessionRedis from session-labs.php.

• Mattflaschen-WMF added subscribers: • rmoen, phuedx.May 27 2015, 2:22 PM

$sessionRedis from session-labs.php

Seems that redis server should only be for session storages isn't it ? I guess on prod you are using a dedicated one or another one. You probably want to align beta cluster config with the production one.

You can assistance to create a new instance on beta cluster and have it added to mediawiki-config via #wikimedia-releng :}

In T100515#1318782, @hashar wrote:

I guess on prod you are using a dedicated one or another one.

Nope, we're doing the same thing in prod and not having problems. In fact, I changed the Labs config to be more like prod in 35056c1c906505e30f4492a771882a54332f5402 .

Perhaps the Labs and prod Redis databases are configured differently and Labs is not persisting to disk, so it gets lost on restart. Or maybe neither are persistent, and prod just hasn't been restarted, but that seems less likely.

We could set up a dedicated Redis database in both prod and Labs, but that seems a bit much for a single feature. I'd prefer to keep it the same in prod for now.

@mattflaschen thanks, I assumed that on production you used a different redis or one dedicated to GettingStarted. Since beta and prod, that is fine.

Back to the original request: I have no idea whether redis is backed up on disk. Apparently it is not on the beta cluster and might not be either on production.

Matt can you reach out with someone that knows about redis system to do the verification? Not much we can do on our side.

thcipriani moved this task from To Triage to Next: Maintenance on the Beta-Cluster-Infrastructure board.Jun 15 2015, 7:40 PM

It looks like beta (looking at deployment-redis01.eqiad.wmflabs) uses aof persistence using the appendfsync everysec directive.

Looks like at some point this instance must've used rdb persistence, but the current directive in /etc/redis/redis.conf is save ""

Here are the persisted files on disk:

thcipriani@deployment-redis01:~ 7m 26s
❯ ll /srv/redis/
total 66M
-rw-rw---- 1 redis redis  66M Jun 16 00:26 deployment-redis01-6379.aof
-rw-rw---- 1 redis redis 113K May 16 09:16 deployment-redis01-6379.rdb

• Mattflaschen-WMF added a project: acl*sre-team.Jun 22 2015, 5:16 AM

indeed it looks like both beta redis are using aof persistence now, does still show up @mattflaschen ?

Restricted Application added subscribers: Luke081515, Matanya. · View Herald TranscriptJul 21 2015, 3:11 PM

It hasn't recently, but I don't know that it's fixed either. I'll mark this closed tentatively and re-open if it happens again.

• Mattflaschen-WMF removed • Mattflaschen-WMF as the assignee of this task.Jul 23 2015, 6:26 PM

• Mattflaschen-WMF set Security to None.

greg moved this task from Next: Maintenance to Done on the Beta-Cluster-Infrastructure board.Sep 28 2015, 3:49 PM

Seems like it did, unless it was a different cause.

• Mattflaschen-WMF mentioned this in T94154: Delete or fix failed GettingStarted browsertests Jenkins job.Nov 6 2015, 1:12 AM

• Mattflaschen-WMF mentioned this in T99655: Upgrade GettingStarted browser tests to use mediawiki_selenium 1.x.

Any news here? Still happening, still "high priority"?

It hasn't happened recently that I know of.

We can always reopen if it happens again...

• Mattflaschen-WMF reopened this task as Open.Feb 27 2016, 2:35 AM

Krenair moved this task from Done to To Triage on the Beta-Cluster-Infrastructure board.Apr 10 2016, 3:35 AM

hashar moved this task from To Triage to In-progress on the Beta-Cluster-Infrastructure board.Apr 19 2016, 8:34 AM

I reopened it since it did happen again: T94154#2066037

• Mattflaschen-WMF removed • Mattflaschen-WMF as the assignee of this task.Jun 1 2016, 4:21 PM

greg moved this task from In-progress to Backlog on the Beta-Cluster-Infrastructure board.Aug 5 2016, 9:00 PM

I'm not seeing the related test failing frequently, https://integration.wikimedia.org/ci/view/Selenium/job/selenium-GettingStarted/
And when it does it doesn't seem related to this, e.g. https://integration.wikimedia.org/ci/job/selenium-GettingStarted/BROWSER=firefox,MEDIAWIKI_ENVIRONMENT=beta,PLATFORM=Linux,label=contintLabsSlave%20&&%20UbuntuTrusty/221/console

GettingStarted on Beta Cluster periodically loses its Redis indexClosed, ResolvedPublicActions

Description

Related Objects

Event Timeline

GettingStarted on Beta Cluster periodically loses its Redis index
Closed, ResolvedPublic
Actions