Page MenuHomePhabricator

Warning: Failed connecting to redis server at rdb100X.eqiad.wmnet: Connection timed out in /srv/mediawiki/php-1.28.0-wmf.XX/includes/libs/redis/RedisConnectionPool.php on line 235
Closed, DuplicatePublicPRODUCTION ERROR

Description

Here's what logstash looks like at the time of writing:

Selection_140.png (345×1 px, 58 KB)

That's a ton of noise about redis timeouts. Way too much. The pairs of 473, 32, and 21 are from this (it emits two error logs for each timeout). That's 1052 errors in the past hour, just in the first page of fatals (there's one more in the next page of fatals).

All of the form:

Warning: Failed connecting to redis server at {redis_server}.eqiad.wmnet: Connection timed out in /srv/mediawiki/php-1.28.0-wmf.{XX}/includes/libs/redis/RedisConnectionPool.php on line 235

Event Timeline

@elukey do you have any idea on what we should here/other people to poke?

@greg probably @aaron might be able to help in figuring out the next steps. I am far from being an expert on this subject but what I'd really like to have is another log indicating what request triggered the timeouts, because it could help in figuring out if the issue is specific to a subset of them or not.

We could also check Redis service metrics but I am not sure if it will lead to anything useful (will do it as soon as possible if nobody else will do it).

@Joe might also have already investigated the issue coming up with a good explanation.

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:11 PM