Page MenuHomePhabricator

toolforge-rs: Improve `toolforge-tunnel` CLI with error handling
Closed, ResolvedPublic

Description

See https://gitlab.wikimedia.org/repos/mwbot-rs/toolforge/-/blob/main/src/bin/cli.rs

There's currently no error handling, just a bunch of unwraps and expects.

Ideally we'd detect a dropped SSH tunnel and restart it. And maybe even gracefully close things on a Ctrl+C signal.

Details

ReferenceSource BranchDest BranchAuthorTitle
repos/mwbot-rs/toolforge!8cli-v2mainmilkydeferImprove `toolforge-tunnel` CLI with error handling
Customize query in GitLab

Event Timeline

@MilkyDefer if you want to give this one a shot, I don't think it should be too difficult either :)

If just proper error handling, that's not too difficult. We just need to consider what to show on screen if errors do occur.

There are some design questions around automatic SSH restart. Should we give a maximum retry count when (a) the child process failed to spawn, or (b) the SSH process exits with a non-zero exit code? Creating an infinite loop that won't break unless SSH process successfully returns is also an option. It's not clear what we want.

Ctrl-C graceful shutdown would be a pain to implement without reconsidering the tool's structure.
The vector used in all_slices() would store a bunch of JoinHandles. A spawned task is not halted if its handle is dropped. If we want a graceful and immediate shutdown when detecting a Ctrl-C, we need to access that vector from outside, and call JoinHandle::abort() one-by-one. As you may imagine, that violates Rust rules.

If just proper error handling, that's not too difficult. We just need to consider what to show on screen if errors do occur.

There are some design questions around automatic SSH restart. Should we give a maximum retry count when (a) the child process failed to spawn, or (b) the SSH process exits with a non-zero exit code? Creating an infinite loop that won't break unless SSH process successfully returns is also an option. It's not clear what we want.

Since this is a development tool that people will probably just enable and leave in the background, I think an infinite loop is fine, provided it prints whatever error SSH emits on why the connection failed. Some 5-10 second waiting period in between restarts would be good. The main use case I'm thinking of is when you open all 8 SSH tunnels, and one or two randomly drop, we should just restart them instead of leaving some working and some broken.

Ctrl-C graceful shutdown would be a pain to implement without reconsidering the tool's structure.
The vector used in all_slices() would store a bunch of JoinHandles. A spawned task is not halted if its handle is dropped. If we want a graceful and immediate shutdown when detecting a Ctrl-C, we need to access that vector from outside, and call JoinHandle::abort() one-by-one. As you may imagine, that violates Rust rules.

In an unrelated project I used a watch channel for this purpose, but I think that would be overkill. AFAICT Ctrl-C properly kills the SSH processes, so there's no real advantage doing it manually in Rust when it's going to be complex.

Fine, I'll give it a shot. Perhaps not by this week, I need to take a small break.