Page MenuHomePhabricator

Figure Out Why Wasmedge Subprocesses Aren't Always Killed; Kill Them
Open, In Progress, HighPublicBUG REPORT

Description

Description

Somehow, some wasmedge subprocesses keep running past their timeout. This should not be remotely possible. Why aren't they being killed/timed out?

Desired behavior/Acceptance criteria (returned value, expected error, performance expectations, etc.)

  • try to replicate this problem in a test environment
  • plug the leak

Completion checklist

Details

Event Timeline

cmassaro renamed this task from Figure Out Why Wasmedge Subprocesses Aren't Always Killed to Figure Out Why Wasmedge Subprocesses Aren't Always Killed; Kill Them.Tue, May 14, 5:16 PM
cmassaro created this task.
Jdforrester-WMF changed the subtype of this task from "Task" to "Bug Report".Tue, May 14, 5:18 PM

I have replicated this locally. There are a few unfortunate things happening here. One is that our last-ditch effort (process.kill) still doesn't work in some cases, and every solution I've tried so far leaves us with zombie processes. It seems like the wasmedge environment does something that messes up process parent relationships, so I'll keep digging into that.

That said, it's also baffling that wasmedge's own resource limitations (which we recently added) aren't helping here. I'm also looking into that.

Jdforrester-WMF changed the task status from Open to In Progress.Tue, May 21, 7:13 PM
Jdforrester-WMF assigned this task to cmassaro.
Jdforrester-WMF removed cmassaro as the assignee of this task.
Jdforrester-WMF assigned this task to cmassaro.

For what it's worth, I am not seeing this issue on Beta cluster

For what it's worth, I am not seeing this issue on Beta cluster

(I don't know if it was a problem before, to be clear, but it isn't a problem now)

Change #1037085 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2024-05-13-145650 to 2024-05-28-185827

https://gerrit.wikimedia.org/r/1037085

Change #1037085 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2024-05-13-145650 to 2024-05-28-185827

https://gerrit.wikimedia.org/r/1037085