Page MenuHomePhabricator

Predis\Connection\ConnectionException: Error while reading line from the server
Closed, ResolvedPublic

Description

We've had a few of these exceptions over the last few months, and they're usually associated with a different error, which causes them to be overlooked. See T306194: Adyen audit is failing by timing out for example. T266591: Long-running tasks such as audit parsing can lose Redis connection and drop queue messages also describes this issue.

In today's case, the PayPal pending transaction resolver failed three times, and I noticed the Redis connection exception again, so I looked into it. What's happening is that the default queue connection, which has a timeout of five minutes, is sitting idle and timing out. After timing out, we try to interact with the queue and encounter the exception.

We're initializing the queue objects as part of the setup and bootstrapping of the SmashPig config and then assigning them to config objects like a service container, which is then retrieved and likely first connected to, when a process needs a specific data backend object. However, that first connection is effectively starting a timer.

This error is discussed here: https://stackoverflow.com/questions/11776029/predis-is-giving-error-while-reading-line-from-server. It looks like the fix involves removing the read_write timeout at the point of instantiating the connection. Let's try this out and see if these problems or exceptions go away. I can't see a reason why we wouldn't want the queue connection to be alive for the entire length of a process. So, let's allow the application process to close down the connection instead of the queue connection timeout.

Event Timeline

AKanji-WMF moved this task from Triage to Chaos Crew Backlog on the Fundraising-Backlog board.

Change 980027 had a related patch set uploaded (by Jgleeson; author: Jgleeson):

[wikimedia/fundraising/SmashPig@master] Set Redis default queue connection timeout to 0.

https://gerrit.wikimedia.org/r/980027

Change 980027 merged by jenkins-bot:

[wikimedia/fundraising/SmashPig@master] Set Redis default queue connection timeout to -1

https://gerrit.wikimedia.org/r/980027

XenoRyet set Final Story Points to 1.