Page MenuHomePhabricator

monitor webservice / 504 errors for erwin
Closed, DeclinedPublic

Description

< akoopal> hi, the webservice for erwin is giving 504 errors again, I can of course restart it myself, but is that something bigbrother can monitor?

< akoopal> https://tools.wmflabs.org/erwin85/
< akoopal> tools that a user 'erwin' wrote, and where migrated, some are quite popular

Event Timeline

Dzahn created this task.Feb 25 2015, 10:10 PM
Dzahn raised the priority of this task from to Needs Triage.
Dzahn updated the task description. (Show Details)
Dzahn added a subscriber: Dzahn.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 25 2015, 10:10 PM
Dzahn set Security to None.Feb 25 2015, 10:11 PM

akoopal asked if this is something that can be monitored by bigbrother. can it?

is this something that is part of T90569 then?

The webservice itself is monitored via the .bigbrotherrc, but it looks to be only looking for the proces. If the webservice is running but unresponsive it doesn't seem to be detected.

coren added a subscriber: coren.Feb 25 2015, 11:29 PM

That is correct, bigbrother checks that the job is running (and therefore the process is alive), but doesn't know what it is supposed to be doing and cannot check that.

Dzahn added a comment.Mar 5 2015, 7:54 PM

< akoopal> hi, getting timeouts again on erwin85's tools
< akoopal> in the log I see:
< akoopal> 2015-03-05 17:05:40: (server.c.1352) [note] sockets enabled again
< akoopal> 2015-03-05 17:05:40: (server.c.1398) [note] sockets disabled, connection limit reached

Dzahn added a subscriber: Erwin.Mar 5 2015, 7:55 PM

@Erwin you are the Erwin of erwin85's tools, right? The Mediawiki user page seemed to confirm that. Added you here.

coren added a comment.Mar 5 2015, 8:44 PM

This almost unfailingly happens when the average time it takes to answer a request gets long enough that average hit rates fill up the default number of allowed connections for a tool. If you determine that the requests are unavoidably long, we can increase that limit as the traffic justifies.

All tools are tools that do some larger queries, but as far as I can see, only order of couple of seconds, not more. I have not gone over all queries to see if they can be more efficient, but knowing Erwin that has been looked at. So if we can (again) tune up this tool, would be appreciated, also by the users of the tools.

coren added a comment.Mar 9 2015, 6:43 PM

I'll be looking at the tool's log to determine the actual load and increase the limit to cover it with some elbow room.

scfc assigned this task to coren.Apr 6 2015, 11:24 AM
scfc triaged this task as Normal priority.
scfc moved this task from Triage to Backlog on the Toolforge board.
coren closed this task as Declined.Nov 17 2015, 2:39 PM

As far as I can tell, the tool has been working properly for several months. In addition, bigbrother is deprecated so no new features are to be added.

Closing; reopen if needed.

Well, it was just down for 2 days until I restarted it (warned by supernino on IRC).