Page MenuHomePhabricator

Intermittent redis connection timeouts in Toolforge
Open, Needs TriagePublic

Description

My spi-tools and spi-tools-dev instances both get intermittent connection timeouts trying to connect to 'redis://tools-redis.svc.eqiad.wmflabs:6379/0'

This one's from spi-tools:

2022-09-22 07:05:41,054 [85ed87d139188b1e89459d6256c854f8] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')

These are from spi-tools-dev:

2022-09-24 00:30:28,839 [a02bcb67be496785d354f27de37359a1] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')
2022-09-15 13:23:26,327 [d1b54ab741b3306abd6265a5adfe711d] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')
2022-09-15 13:12:51,447 [d46cdc3ce7e666e86a697d6319fc4753] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')
2022-09-19 02:03:30,663 [c3fc2d85621909d117c9d4fd638d1f9e] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')
2022-09-22 11:03:18,759 [2adbd2f8af1c3f5910fd4594df6ce2f3] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')
2022-09-24 00:43:35,290 [539c3b47867d712a148543369c263d4f] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')

Event Timeline

taavi added a subscriber: taavi.

Hmm. I've checked the Redis and Keepalived (what we use for providing HA for Redis) logs for the timeout on 24 September, nothing strange there. The Redis docs say that by default either the client nor the server have any idle timeout set, nor does our configuration set any limits.

I see that the Redis server has little under 3k connected clients. Maybe that's causing the server to hit some connection limit?

Here's another one from earlier today:

2022-09-25 15:27:55,879 [32d7f05b0dc0d1ee6387e65d4fcfc429] ERROR tools_app.redis: Redis ConnectionError: Error while reading from socket: (110, 'Connection timed out')

I see I'm a few releases behind on redis-py (I'm on 3.5.3). I'll upgrade to the latest (4.3.4) to see if that makes any difference. There's a few changes mentioned in the redis-py release notes that have to do with handling connection timeouts better, so maybe one of them will fix this. Worth a shot.

Just a status update on this...

I upgraded spi-tools-dev to the latest redis-py release and haven't seen any more timeouts. On the other hand, spi-tools is still running the older release and I haven't seen any more timeouts there either. I'd like to leave this ticket open for a while and I'll just keep watching to see if anything changes on either instance.

Just got another one:

logs/django/django.log.2022-09-27:2022-09-28 16:43:39,873 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: IndexView()
logs/django/django.log.2022-09-27:2022-09-28 16:59:18,903 [76e999afc82c10fb99b6c9bf76448d1a] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')
logs/django/django.log.2022-09-27:2022-09-28 16:59:19,196 [76e999afc82c10fb99b6c9bf76448d1a] INFO tools_app.middleware: request took 0:15:39.323408

@RoySmith for clarity, did the newer redis-py fix the issue? Or is this timeout utilizing the newer redis-py?

My apologies for being unclear. It's with the instance that I patched to use the newer redis-py.

These are still happening. I got two more in the past few days:

2022-11-07 22:28:34,874 [6eca8aecf8ad23f8a0b80510519a8682] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')
2022-11-10 20:05:24,539 [d75e23b8c3f01d022f407bececb10bb1] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')

Could I get an update on progress being made on this? I got another cluster of three of these a few days ago:

2022-11-17 22:31:43,292 [4cfc6e79d1f5bb0bd15d8127c264211c] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')
2022-11-17 22:33:01,116 [183775f0b86d1804b10ae044417928db] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')
2022-11-17 22:33:56,411 [509d4b6c7335c6e37f20373dae00fa08] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')

And another one:

2022-11-25 12:52:55,355 [b25354061a92b9a33d0d304740296b87] ERROR tools_app.redis: Redis ConnectionError: Error while reading from tools-redis.svc.eqiad.wmflabs:6379 : (110, 'Connection timed out')