Page MenuHomePhabricator

stashbot_______ is now known as stashbot________
Closed, ResolvedPublic

Description

(Can't find a project for stashbot...)

[23:06:25] --> stashbot_ (~stashbot@wikimedia/bot/stashbot) has joined #wikimedia-labs
[23:06:47] <-> stashbot_ is now known as stashbot__
[23:07:17] <-> stashbot__ is now known as stashbot___
[23:07:47] <-> stashbot___ is now known as stashbot____
[23:08:17] <-> stashbot____ is now known as stashbot_____
[23:08:48] <-> stashbot_____ is now known as stashbot______
[23:09:18] <-> stashbot______ is now known as stashbot_______
[23:09:48] <-> stashbot_______ is now known as stashbot________

Event Timeline

20 mins later:

15:30 ⇐ stashbot quit (~stashbot@wikimedia/bot/stashbot) Remote host closed the connection
15:30 stashbot________ → stashbot

[[https://bots.wmflabs.org/~wm-bot/logs/%23wikimedia-operations/20170201.txt|#wikimedia-operations]]:

[07:13:30] <akosiaris>	 !log restart thumbor process on thumbor1001, thumbor1002, apply a different LimitNOFILE on thumbo1002
[07:13:33] <stashbot________>	 Failed to log message to wiki. Somebody should check the error logs.
[07:13:34] <stashbot>	 Logged the message at https://wikitech.wikimedia.org/wiki/Server_Admin_Log
[07:13:53] <akosiaris>	 ETOOMANYSTASHBOTS
[07:14:00] <akosiaris>	 will deal with that later though

Maybe @akosiaris dealt with it?

Nope, not really, I did notice the quit and proper renaming and let it be.

Ugh. I thought I had fixed the infinite _ bug. It happens when auth to freenode isn't working and sometimes is related to k8s deciding to run two pods at the same time.

Ugh. I thought I had fixed the infinite _ bug. It happens when auth to freenode isn't working and sometimes is related to k8s deciding to run two pods at the same time.

It is quite possibly related to kubernetes this time around as well. Around the time of the issue there were alerts in #wikimedia-operations

07:03:38 icinga-wm: PROBLEM - All k8s worker nodes are healthy on checker.tools.wmflabs.org is CRITICAL: HTTP CRITICAL: HTTP/1.1 503 SERVICE UNAVAILABLE - string OK not found on http://checker.tools.wmflabs.org:80/k8s/nodes/ready - 185 bytes in 0.162 second response time

recovering some 27 mins later

07:30:38 icinga-wm: RECOVERY - All k8s worker nodes are healthy on checker.tools.wmflabs.org is OK: HTTP OK: HTTP/1.1 200 OK - 166 bytes in 0.131 second response time

and stashbot quiting a few seconds before icinga announcing the recovery

07:30:17 stashbot left the room (quit: Remote host closed the connection).
07:30:27 stashbot________ is now known as stashbot

Unfortunately, due to the thumbor issue at the time, never took a chance to look at it.

Things to fix in stashbot's python code:

  • Check to see if the current nick is stashbot_ and do not add another _. If both stashbot and stashbot_ are taken just die with a quit message like "Cowardly refusing to fill the channel with copies of myself".
  • Add support for /msg nickserv regain stashbot but only on initial connect
  • Add a check to not do most things if not holding the primary nick for the account (e.g. don't double log things)

The infinite _ bug was fixed in jouncebot for T150916: Jouncebot: Add functionality to change Nick from Jouncebot_ to Jouncebot automatically.

Meta TODO: extract the basic irc bot skeleton from stashbot & jouncebot and turn it into a pypi package that they share and others can use.

Change 338048 had a related patch set uploaded (by BryanDavis):
Guard against multiple bots competing in channels

https://gerrit.wikimedia.org/r/338048

Change 338048 merged by jenkins-bot:
Guard against multiple bots competing in channels

https://gerrit.wikimedia.org/r/338048