Page MenuHomePhabricator

wikibugs failing to connect when run on exec hosts
Closed, ResolvedPublic

Description

2021-09-15 19:47:54,556 - irc3.wikibugs - DEBUG - Starting wikibugs...
2021-09-15 19:47:54,708 - irc3.wikibugs - DEBUG - Connected
2021-09-15 19:47:54,709 - irc3.wikibugs - DEBUG - CONNECT ping-pong ()
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/tasks.py", line 92, in __del__
    self._loop.call_exception_handler(context)
AttributeError: 'NoneType' object has no attribute 'call_exception_handler'
Exception ignored in: <object repr() failed>
Traceback (most recent call last):
  File "/usr/lib/python3.5/asyncio/tasks.py", line 92, in __del__
    self._loop.call_exception_handler(context)
AttributeError: 'NoneType' object has no attribute 'call_exception_handler'
2021-09-15 19:48:35,962 - irc3.wikibugs - CRITICAL - connection lost (140180063038656): None
2021-09-15 19:48:35,962 - irc3.wikibugs - CRITICAL - closing old transport (140180063038656)

Event Timeline

Legoktm triaged this task as Unbreak Now! priority.Sep 15 2021, 7:48 PM
Legoktm created this task.

For some reason if I run this on the bastion directly, it's fine.

Actually if I run it directly on a exec node it's fine too, but under the grid it's having issues.

Mentioned in SAL (#wikimedia-cloud) [2021-09-15T20:03:02Z] <legoktm> redis2irc is running in a screen because of T291129

So I tried:

  • Running redis2irc manually on the exec node it was failing from - works
  • Bumping grid memory to 2G - didn't work
  • Running redis2irc from a different exec node, under grid - didn't work
  • Running redis2irc with logging enabled manually (to see if logging was causing problems) - works
  • Disabling TLS since we had issues with that in the past - didn't work

I'm totally at a loss of what's wrong. I'm running it in a screen now on the bastion just so it's back up.

Legoktm claimed this task.
Legoktm added a subscriber: Bstorm.

Based on discussion with @Bstorm I switched our REDIS_HOST config from "tools-redis" to "tools-redis.svc.eqiad.wmflabs", and now it works.

I'm not really sure that actually fixed it or we got lucky, but I'll take it.

I'm not really sure that actually fixed it or we got lucky, but I'll take it.

Same, unfortunately.