Page MenuHomePhabricator

Celery task pool doesn't degrade nicely.
Closed, ResolvedPublic

Description

In T175860, I discovered that the signal() timeout mechanism breaks the Celery service on a local VM, by running out of memory. Once we can give separate Celery revocation and alarm timeouts, we can test by shortening and exceeding the alarm timeout. It probably is correct to kill and fork the uwsgi, but doing so should cost net zero memory change, with the kill made before the fork. But short of this ideal, we don't want an OOM to damage the pool or the OS. Avoid damage if possible. Have the pool auto-degrade to a smaller number of workers. Log accurately.

Event Timeline

This might be invalid, I need to set more appropriate limits and see how the pool behaves.

awight renamed this task from [Investigate] Does alarm timeout break Celery? to Celery task pool doesn't degrade nicely..Sep 14 2017, 4:18 AM
awight updated the task description. (Show Details)
Ladsgroup triaged this task as Medium priority.Nov 26 2018, 4:56 PM
Ladsgroup added a project: TestMe.
Ladsgroup subscribed.

This should be rechecked because of upgrading to celery4, it has more robust way of handling tasks.