Page MenuHomePhabricator

Raise timeout value for IRC Bots
Closed, DeclinedPublic3 Estimated Story Points

Description

The sopel timeout is currently 120s.

This is probably too low and causing a hightened rate of false timeout detections and inceased risk of crash during a network wobble.

I plan to raise the timeout value on both tools in a rolling fashion over the next few weeks to avoid impact.

InstanceDateNew Value
Test22 Apr210
Prod29 Apr210
Test5 May240
Prod6 May240
Test7 May270
Prod8 May270
Test11 May300
Prod12 May300

Details

Due Date
May 12 2020, 10:59 PM

Event Timeline

RhinosF1 created this task.
RhinosF1 moved this task from Radar to Deployment on the User-RhinosF1 board.
RhinosF1 moved this task from To Triage / Backlog to 2020 June - September on the ZppixBot board.
RhinosF1 set Due Date to Apr 29 2020, 10:59 PM.

Mentioned in SAL (#wikimedia-cloud) [2020-04-22T10:51:14Z] <RhinosF1> Test Bot to 210s timeout refs T250861 - rolling timeout to 300s

  • Test - 210
  • Prod - 210
  • Test - 240
  • Prod - 240
  • Test - 270
  • Prod - 270
  • Test - 300
  • Prod - 300
RhinosF1 set the point value for this task to 3.Apr 22 2020, 11:54 AM
RhinosF1 set Final Story Points to 3.
RhinosF1 changed Due Date from Apr 29 2020, 10:59 PM to May 5 2020, 10:59 PM.
RhinosF1 changed Due Date from May 5 2020, 10:59 PM to May 7 2020, 10:59 PM.

Mentioned in SAL (#wikimedia-cloud) [2020-04-29T09:22:04Z] <RhinosF1> Prod Bot to 210s timeout refs T250861 - rolling timeout to 300s

RhinosF1 changed Due Date from May 7 2020, 10:59 PM to May 12 2020, 10:59 PM.

Mentioned in SAL (#wikimedia-cloud) [2020-05-05T09:22:21Z] <RhinosF1> Test Bot to 240s timeout refs T250861 - rolling timeout to 300s

Based on the hanging it just did, I’m not sure increasing the timeout is too good of an idea as it means it randomly hangs.

It did recover without disconnecting but it would have probably been clearer if the timeout wasn’t preventing a reboot.

If they are no objections, I suggest we roll back the timeout change and as it merely hides the issue.

As it seems the timeout value isn't the root cause of the bot's issues, I have no objection.

RhinosF1 removed RhinosF1 as the assignee of this task.