Steps I've found to reliably reproduce, assuming the tool uses Kubernetes:
- webservice restart outputs the normal Restarting webservice...
- Refresh the page in your browser. It hangs for a few seconds before getting a 502
- Attempting webservice restart again does not work
- Running webservice stop, waiting a few seconds, then webservice start successfully brings the service back up
- Running webservice restart shortly thereafter works just fine, and does not bring the tool down like it did the first time
- Wait some amount of time (in my case ~24 hours), and the issue will surface again
Some IRC discussion that might help:
... 16:23:05 <bd808> and it didn't give you an error message at the cli when it failed? 16:23:12 <chasemp> it seems like the restart vs stop/start is a red herring, it's just different state post stop/start regardless of mechanism 16:23:45 <bd808> there is a "# FIXME: Treat pending state differently" in start() that might be somehow related 16:24:15 <musikanimal> I didn't see any errors, no 16:25:15 <bd808> oh... i see a way it could go badly 16:25:23 <bd808> stop() waits max 15s and then returns