Page MenuHomePhabricator

monitor webservice / 504 errors for erwin
Closed, DeclinedPublic

Description

< akoopal> hi, the webservice for erwin is giving 504 errors again, I can of course restart it myself, but is that something bigbrother can monitor?

< akoopal> https://tools.wmflabs.org/erwin85/
< akoopal> tools that a user 'erwin' wrote, and where migrated, some are quite popular

Event Timeline

Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn subscribed.

akoopal asked if this is something that can be monitored by bigbrother. can it?

is this something that is part of T90569 then?

The webservice itself is monitored via the .bigbrotherrc, but it looks to be only looking for the proces. If the webservice is running but unresponsive it doesn't seem to be detected.

That is correct, bigbrother checks that the job is running (and therefore the process is alive), but doesn't know what it is supposed to be doing and cannot check that.

< akoopal> hi, getting timeouts again on erwin85's tools
< akoopal> in the log I see:
< akoopal> 2015-03-05 17:05:40: (server.c.1352) [note] sockets enabled again
< akoopal> 2015-03-05 17:05:40: (server.c.1398) [note] sockets disabled, connection limit reached

@Erwin you are the Erwin of erwin85's tools, right? The Mediawiki user page seemed to confirm that. Added you here.

This almost unfailingly happens when the average time it takes to answer a request gets long enough that average hit rates fill up the default number of allowed connections for a tool. If you determine that the requests are unavoidably long, we can increase that limit as the traffic justifies.

All tools are tools that do some larger queries, but as far as I can see, only order of couple of seconds, not more. I have not gone over all queries to see if they can be more efficient, but knowing Erwin that has been looked at. So if we can (again) tune up this tool, would be appreciated, also by the users of the tools.

I'll be looking at the tool's log to determine the actual load and increase the limit to cover it with some elbow room.

scfc triaged this task as Medium priority.
scfc moved this task from Backlog to Ready to be worked on on the Toolforge board.

As far as I can tell, the tool has been working properly for several months. In addition, bigbrother is deprecated so no new features are to be added.

Closing; reopen if needed.

Well, it was just down for 2 days until I restarted it (warned by supernino on IRC).