Page MenuHomePhabricator

Wheel of Misfortune should provide bastion host details
Closed, ResolvedPublic

Description

I recently got an email;

"systemd killed by Wheel of Misfortune on tools bastion"

It would be good if the actual host could be identified. There's at least 4 machines I know of which could be used as a Toolforge bastion (two of which I use on a regular basis). Identifying the full hostname would make this notification more useful.

Event Timeline

If it's helpful, this specific case was on tools-sgebastion-11 / dev-buster.toolforge.org.

How was systemd even killed? The script is configured to not touch it at all.

Got another one today:

Your process `bash` has been killed by the Wheel of Misfortune script.

just playing around with psutil, I see it lists '/usr/bin/bash', but SHELLS in that script includes '/bin/bash', not ''/usr/bin/bash'.

Is this happening only on the buster bastions? psutil changed a lot over time, and the stretch version is very old. I had to develop directly on the bastion to work with it.

That said, it should not be hard to include the hostname, which would help answering my question.

Bstorm renamed this task from Wheel of Misfortune should provide more details to Wheel of Misfortune should provide bastion host details.May 17 2021, 5:03 PM

Change 693485 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] toolforge bastion: do not run wheelofmisfortune on buster yet

https://gerrit.wikimedia.org/r/693485

Change 693487 had a related patch set uploaded (by Bstorm; author: Bstorm):

[operations/puppet@production] wheel_of_misfortune: include the hostname in the email

https://gerrit.wikimedia.org/r/693487

Change 693487 merged by Bstorm:

[operations/puppet@production] wheel_of_misfortune: include the hostname in the email

https://gerrit.wikimedia.org/r/693487

@RoySmith Have you got any examples with the host in them so far? If not, I can go ahead and deliberately create test cases.

No, I've been well-behaved lately. But, I'll be happy to nail a tmux session up on one of the bastions for as long as it takes to generate an example if you like :-)

By all means have at it! My attention is mostly elsewhere.

Just got one:

Your process bash has been killed on tools-sgebastion-11 by the Wheel of
Misfortune script.

Change 693485 merged by Bstorm:

[operations/puppet@production] toolforge bastion: do not run wheelofmisfortune on buster yet

https://gerrit.wikimedia.org/r/693485

Since that confirms the suspicion that the problem is that the problem is that buster has a much later library for the python that is using, I've merged my patch to be selective about it for now, and I can try to to port the script to buster. I also manually yanked the service out of systemd on the buster bastions.

Honestly, on buster, it will probably work more like my laptop does, making it easier. On stretch I had to develop in a VM because the versions were so old.

Change 729593 had a related patch set uploaded (by Majavah; author: Majavah):

[operations/puppet@production] toolforge: wheel of misfortune: dry run on buster

https://gerrit.wikimedia.org/r/729593

Change 729593 merged by Bstorm:

[operations/puppet@production] toolforge: wheel of misfortune: dry run on buster

https://gerrit.wikimedia.org/r/729593

taavi claimed this task.