Pick a random port when restarting because of port conflict + introduce a timeout before retrying with the new port
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	ssastry
	May 4 2015, 6:33 PM

Description

Krinkle reported that when he accidentally restarted Parsoid a second time, that shell hit high cpu load because of the master repeatedly trying to restart the workers on the same port.

So, we need:
(a) a retry timeout
(b) pick a random port if the error was a port conflict

Event Timeline

ssastry created this task.May 4 2015, 6:33 PM

ssastry raised the priority of this task from to Medium.

ssastry updated the task description. (Show Details)

ssastry added a project: Parsoid.

ssastry added subscribers: ssastry, Krinkle.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 4 2015, 6:33 PM

The caveat on this bug, however, is that icinga / monitoring scripts might rely on parsoid being live on :8000 .. so, this should be a localsettings.js option for restarting on a random port. In production, we have occasionally run into this during deploys when upstart fails to kill the old workers.

A number of things expect parsoid to be on port 8000 (like icinga), so picking a random port is probably not a good idea. A retry timeout would be good, though.

ssastry set Security to None.May 4 2015, 6:57 PM

ssastry added a subscriber: • GWicke.

Note that @Krinkle was running two parsoids at the same time on the same point. That's a fairly unusual situation and *different* from:

Using SO_REUSEADDR to ensure there's no timeout on a normal restart (all sockets in Node.js set SO_REUSEADDR already)
Starting Parsoid on a "free port" for testing purposes (we already have an API for this which we use in our testing harness)
The behavior of Parsoid when run under upstart or systemd or other daemon manager. (We should just die and let the manager restart us.)

Contrary to the title of this bug, I *don't* think that picking a random port is a good idea. I *do* think it might be worthwhile to strip out the "restart" functionality from Parsoid, and let the daemon manager handle that.

Parsoid/JS is not in use any more.

Pick a random port when restarting because of port conflict + introduce a timeout before retrying with the new portClosed, DeclinedPublicActions

Description

Event Timeline

Pick a random port when restarting because of port conflict + introduce a timeout before retrying with the new port
Closed, DeclinedPublic
Actions