Page MenuHomePhabricator

ZppixBot timing out repeatedly on ZppixBot version 7
Closed, ResolvedPublic

Description

No confirmed reason, sopel.log indicates a timeout

Creating task to track.

Update has now been rolled back & pycache / *.mmdb's wiped.

Event Timeline

Timeline (UTC):

  • update deployed 08:30
  • disconnected 08:46
  • disconnected 09:29
  • rollback deployed 10:31

2020-06-18 08:46:12,191 << :orwell.freenode.net PONG orwell.freenode.net :chat.freenode.net
2020-06-18 08:46:22,495 << :RhinosF1!uid339563@miraheze/RhinosF1 PRIVMSG ZppixBot :Reception123: you might have a fun surprise on Phab in ~30s
2020-06-18 08:46:23,880 >> PRIVMSG #ZppixBot-logs :[2020-06-18 08:46:14,951] sopel.irc.backends ERROR - Server timeout detected after 122s; closing.

and

2020-06-18 09:28:59,697 >> PING chat.freenode.net
2020-06-18 09:28:59,712 << :cherryh.freenode.net PONG cherryh.freenode.net :chat.freenode.net
2020-06-18 09:29:07,102 >> PRIVMSG #ZppixBot-logs :[2020-06-18 09:29:07,084] sopel.irc.backends ERROR - Server timeout detected after 127s; closing.

which provides no information useful

Mentioned in SAL (#wikimedia-cloud) [2020-06-18T13:42:43Z] <wm-bot> <rhinosf1> redeployed v7 with a sketchy trace tool to attempt to do something about T255763

After some testing, I think this is the pycs in our venv.

I'll deploy again and fix the venv tonight on both instances.

It will probably cause downtime of up to 15 mins per instance.

Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:29:40Z] <RhinosF1> stop sopel to reset pip, pyc, and mmdb -- T255763

Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:38:41Z] <RhinosF1> started deployment back -- T255763

Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:46:35Z] <RhinosF1> stop sopel & cron to reset pip, pyc, and mmdb -- T255763

Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:49:33Z] <RhinosF1> stop running ddtrace for T255763

Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:56:25Z] <RhinosF1> started deployment & cron back -- T255763

I believe this was a broken venv, many bugs maybe (we should retry the python 3.7 upgrade).

Boldy resolving this!