Page MenuHomePhabricator

Improve etcdmirror shutdown behavior
Closed, ResolvedPublic

Description

During functional testing for T358636, I ran into some interesting behavior: on SIGTERM, etcdmirror just hangs.

Why? Twisted's reactor.run by default installs signal handlers that supersede ours [0], which means that the watch loop [1] in read_write will never terminate.

The resulting behavior is presumably that systemd waits 90s and then issues SIGKILL, which is non-ideal (e.g., could in theory land while we're part of the way through mirroring an event).

For T358636, I'm going to need to restart etcdmirror at least once (possibly a couple of times), and it would be nice to fix this before then.

Is there a simple fix? This should be as simple as setting installSignalHandlers=False when calling run (verified by local testing), which ensures that etcdmirror exits in the next 60s (watch timeout).

[0] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/etcd-mirror/+/b5823aa025b8555738f8fbe409e0e6c000e1985c/etcdmirror/main.py#142

[1] https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/etcd-mirror/+/b5823aa025b8555738f8fbe409e0e6c000e1985c/etcdmirror/main.py#197

Event Timeline

Scott_French changed the task status from Open to In Progress.Apr 3 2024, 9:58 PM
Scott_French triaged this task as Low priority.

Change #1016862 had a related patch set uploaded (by Scott French; author: Scott French):

[operations/software/etcd-mirror@master] Improve etcdmirror shutdown behavior

https://gerrit.wikimedia.org/r/1016862

Change #1016862 merged by jenkins-bot:

[operations/software/etcd-mirror@master] Improve etcdmirror shutdown behavior

https://gerrit.wikimedia.org/r/1016862