No confirmed reason, sopel.log indicates a timeout
Creating task to track.
Update has now been rolled back & pycache / *.mmdb's wiped.
No confirmed reason, sopel.log indicates a timeout
Creating task to track.
Update has now been rolled back & pycache / *.mmdb's wiped.
Timeline (UTC):
2020-06-18 08:46:12,191 << :orwell.freenode.net PONG orwell.freenode.net :chat.freenode.net
2020-06-18 08:46:22,495 << :RhinosF1!uid339563@miraheze/RhinosF1 PRIVMSG ZppixBot :Reception123: you might have a fun surprise on Phab in ~30s
2020-06-18 08:46:23,880 >> PRIVMSG #ZppixBot-logs :[2020-06-18 08:46:14,951] sopel.irc.backends ERROR - Server timeout detected after 122s; closing.
and
2020-06-18 09:28:59,697 >> PING chat.freenode.net
2020-06-18 09:28:59,712 << :cherryh.freenode.net PONG cherryh.freenode.net :chat.freenode.net
2020-06-18 09:29:07,102 >> PRIVMSG #ZppixBot-logs :[2020-06-18 09:29:07,084] sopel.irc.backends ERROR - Server timeout detected after 127s; closing.
which provides no information useful
Mentioned in SAL (#wikimedia-cloud) [2020-06-18T13:42:43Z] <wm-bot> <rhinosf1> redeployed v7 with a sketchy trace tool to attempt to do something about T255763
After some testing, I think this is the pycs in our venv.
I'll deploy again and fix the venv tonight on both instances.
It will probably cause downtime of up to 15 mins per instance.
Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:29:40Z] <RhinosF1> stop sopel to reset pip, pyc, and mmdb -- T255763
Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:38:41Z] <RhinosF1> started deployment back -- T255763
Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:46:35Z] <RhinosF1> stop sopel & cron to reset pip, pyc, and mmdb -- T255763
Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:49:33Z] <RhinosF1> stop running ddtrace for T255763
Mentioned in SAL (#wikimedia-cloud) [2020-06-18T21:56:25Z] <RhinosF1> started deployment & cron back -- T255763
I believe this was a broken venv, many bugs maybe (we should retry the python 3.7 upgrade).
Boldy resolving this!