Page MenuHomePhabricator

CirrusSearch: No enabled connection error
Closed, ResolvedPublic

Description

I'm seeing these in the log:
2014-07-07 14:05:42 mw1015 commonswiki: Update for doc ids: 15731483; error message was: No enabled connection

I don't imagine we're actually out of connections - probably just hitting some other http error and eating it. We should not eat it.


Version: unspecified
Severity: normal

Details

Reference
bz67605

Event Timeline

bzimport raised the priority of this task from to High.Nov 22 2014, 3:39 AM
bzimport added a project: CirrusSearch.
bzimport set Reference to bz67605.
demon added a comment.Jul 7 2014, 6:55 PM

This should only happen after we've tried getMaxConnectionAttempts() times. We never call getConnection() directly, so it should be handled by our callback.

Getting a full stacktrace out of this would help.

Yeah. I've seen this in a maintenance script when I tried to bulk insert too much data. I _think_ it has something to do with commands being too too too big getting interpreted as a retry-able http error. We actually have logic to resubmit the command as singletons but I think we don't get to use it because we've marked all the connections as busted due to the error. Might try to reproduce with stupid huge page.

Change 156803 had a related patch set uploaded by Manybubbles:
Increase timeout on updates

https://gerrit.wikimedia.org/r/156803

https://gerrit.wikimedia.org/r/#/c/156798/
and
https://gerrit.wikimedia.org/r/#/c/156800/
as well.

Its not stupid huge pages. Its massive influxes of updates all at once, I believe.

Anyway, these commits should make it more stable as well as give us more information when it fails.

Change 156803 merged by jenkins-bot:
Increase timeout on updates

https://gerrit.wikimedia.org/r/156803