Page MenuHomePhabricator

Fail Early if Evaluator Can't Acquire a WASI Runner
Closed, ResolvedPublic

Description

Description

Instead of waiting for the whole evaluator to time out, we should place a very short (<1s) timeout on WASI resource acquisition. That will reduce the stress on the system when WASI resources are not able to be acquired and help us debug the recurring Python outages. Specifically, if the timeouts are due to resource acquisition, we should not see them recur (and if they are, we may see them recur). Either way, we will get information about the root cause of the outages.


Completion checklist

Details

Related Changes in Gerrit:
Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Add a heartbeat to the executor to fail early when executors are unavailable.repos/abstract-wiki/wikifunctions/function-evaluator!433apineapine-wasi-failmain
Customize query in GitLab

Event Timeline

Change #1204582 had a related patch set uploaded (by Jforrester; author: Jforrester):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-11-05-063501 to 2025-11-12-122736

https://gerrit.wikimedia.org/r/1204582

Change #1204582 merged by jenkins-bot:

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-11-05-063501 to 2025-11-12-122736

https://gerrit.wikimedia.org/r/1204582

Change #1211872 had a related patch set uploaded (by Cory Massaro; author: Cory Massaro):

[operations/deployment-charts@master] wikifunctions: Upgrade evaluators from 2025-11-12-122736 to 2025-11-17-175029

https://gerrit.wikimedia.org/r/1211872