Page MenuHomePhabricator

Silence or address E_WOULDBLOCK warning
Closed, ResolvedPublic

Assigned To
Authored By
Halfak
Nov 30 2016, 4:08 PM
Referenced Files
F26716106: image.png
Oct 22 2018, 8:45 PM
F26624764: image.png
Oct 18 2018, 9:40 PM
F26624782: image.png
Oct 18 2018, 9:40 PM
Tokens
"Yellow Medal" token, awarded by akosiaris."Meh!" token, awarded by awight.

Description

ORES reports the following warning from Celery

[2016-11-30 10:05:37,639: WARNING/Worker-4] /home/halfak/venv/3.5/lib/python3.5/site-packages/celery/result.py:45: RuntimeWarning: Never call result.get() within a task!
See http://docs.celeryq.org/en/latest/userguide/tasks.html#task-synchronous-subtasks

In Celery 3.2 this will result in an exception being
raised instead of just being a warning.

  warnings.warn(RuntimeWarning(E_WOULDBLOCK))

This is due to a dependent task pattern we have been using for a while. We use timeouts to make sure that this doesn't result in a problematic block. Yet the warning continues. And as implied, it seems like this will fail if we ever upgrade to 3.2.

This task is done when we either figure out a better way to do the dependent task pattern or silence/prevent this warning.

Event Timeline

Halfak triaged this task as Medium priority.Nov 30 2016, 4:09 PM

E.g. image a score request comes in for "enwiki:damaging:123456:0.3.0" and it's already been started.

The task ID that get's matched is for a "_lookup_score_in_map" job that depends on a "_process_score_map" job.


E.g. a request comes in for "ptwiki:damaging:123456:0.3.2" and "ptwiki:goodfaith:123456:0.3.2". It turns out that a "_lookup_score_in_map" job has been started for "ptwiki:damaging:123456:0.3.2", but not for "ptwiki:goodfaith:123456:0.3.2". Then, a new "_process_score_map" job will be started with only the "goodfaith" model and a dependent "_lookup_score_in_map" job will be started for "ptwiki:goodfaith:123456:0.3.2".

These patterns help us not duplicate scoring, but they involve this direct and complex dependency pattern.

This *should* be useful, but on my last inspection, I didn't find anything that would work for us. http://docs.celeryproject.org/en/latest/userguide/canvas.html#the-primitives

Here's someone using a Redis mutex to deduplicate tasks:
https://github.com/joealcorn/celery-dedupe
https://github.com/joealcorn/celery-dedupe/blob/master/celery_dedupe/tasks.py

The same approach is used here:
https://github.com/5monkeys/celery-deduplicate

Mutex is oldie but goodie... But it looks like you're already doing something similar and much more complex with score_result.state? I haven't been able to tell whether the score result map is being updated using exclusive, atomic operations, or if it might be possible for two jobs to race one another by setting a lock at the same instant?

Either way, the joealcorn code does a sleight of hand that should solve our result.get woes. The DedupeTask is able to return either itself or the "dependency" job depending on whether the task was a duplicate or not.

Nice work @awight. Looks like a solid find. I like the strategy of getting away from celery internal mechanisms for managing this.

Reading celery_once, I think they actually didn't lock the mutex correctly. There's a race condition between a GET and SETEX. Plus, this library is encumbered by OO inheritance and snowflake configuration.

So, let's implement this ourselves. apply_async should be guarded by a SETNX using {wiki, model, revision} as the key. We can expire in a second command, and surround in a transaction so that the SETNX will be rolled back if we somehow die before setting an expiry on the lock.

The value of the mutex should include the Celery ID that allows symbionts to attach to our result when available.

We should rig worker threads to timeout, and set the lock to expire slightly after that deadline.

If a worker thread encounters a lock, we should convert into a parasitic wait-fetch-return symbiont upon the real worker, and move out of the main thread pool accounting.


Once we're using Redis clustering, we'll need to handle multi-master problems. A reference implementation of distributed locking exists:
https://redis.io/topics/distlock
https://github.com/SPSCommerce/redlock-py

At the risk of saying something stupid (as I am not familiar with either Celery or ORES internals), it feels like there are two different issues being mixed up here: 1) some tasks need to be composed from smaller subtasks (e.g. to serve a request you need to look up each revision, then process each revision that was not found, then join the results into a list), 2) you need to deduplicate scoring tasks.
Celery is complaining about the manner in which 1) is done (instead the main task actually executing the subtasks, Celery wants it to return a descriptor object saying "I can be run by running these tasks in parallel, then running this task with the result list" so it can manage resources more efficiently). To solve 2), you need some sort of locking, but you are already doing 2) so you must have solved that at some point. Whether you are doing the deduplication in a blocking manner should be an orthogonal issue.

@Tgr
Thanks for setting me back on the right track! Yeah, all we need to do for this specific bug is change the way we convert into a dependent task.

Ladsgroup removed a project: User-Ladsgroup.
Ladsgroup added a subscriber: Ladsgroup.

Getting back to this after we implemented poolcounter support inside ORES. mediawiki and thumbor use PoolCounter basically to do the exact same thing. Preventing duplicated intensive actions (in case of mediawiki, invalidating parser cache and in case of thumbor, thumbnailing) How do you feel if we use poolcounter and then simplify the celery logic in ores?

I don't see how pool counter would help us sequence our tasks in celery. What's your proposal?

I want to get back on this. I deep dived into celery docs, watched several conference presentations and changed my mind several times until I decided that lock manager (PoolCounter) is our best option. Lock manager have this throttling mechanism that wakes all of the queue once the main one is released. In that way, we can just call PoolCounter and it gives response once the job is fully done by workers and then we can simply look up the results.

https://github.com/wikimedia/ores/pull/270 fixes this, I used redis to deduplicate instead as I needed value for the key. I have tested it in several different conditions and all works as expected.

I want to get back on this. I deep dived into celery docs, watched several conference presentations and changed my mind several times until I decided that lock manager (PoolCounter) is our best option. Lock manager have this throttling mechanism that wakes all of the queue once the main one is released. In that way, we can just call PoolCounter and it gives response once the job is fully done by workers and then we can simply look up the results.

This is rad, thank you for pursuing it!

Change 467428 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[operations/puppet@production] ores: puppet config for redis task tracker

https://gerrit.wikimedia.org/r/467428

Change 467428 merged by Alexandros Kosiaris:
[operations/puppet@production] ores: puppet config for redis task tracker

https://gerrit.wikimedia.org/r/467428

This is number of scores processed by celery:

image.png (704×753 px, 60 KB)

The first spike is when there was one precaching enabled.
The second spike was when there was two precaching enabled. It got increased but basically around 1.2 and not twice (250/min vs. 300/min).

This is the same thing with one and two precache on celery deduplicator (when the load was different)

image.png (623×512 px, 63 KB)

Change 468487 had a related patch set uploaded (by Ladsgroup; owner: Ladsgroup):
[mediawiki/services/ores/deploy@master] Increase verbosity of logs and use redis for task tracking

https://gerrit.wikimedia.org/r/468487

Change 468487 merged by Ladsgroup:
[mediawiki/services/ores/deploy@master] Increase verbosity of logs and use redis for task tracking

https://gerrit.wikimedia.org/r/468487

Mentioned in SAL (#wikimedia-operations) [2018-10-22T20:06:57Z] <ladsgroup@deploy1001> Started deploy [ores/deploy@e89e880]: Use redis task tracker (T152012)

Mentioned in SAL (#wikimedia-operations) [2018-10-22T20:28:59Z] <ladsgroup@deploy1001> Finished deploy [ores/deploy@e89e880]: Use redis task tracker (T152012) (duration: 22m 02s)

Removing redundant lookup tasks basically cut our response time to half:

image.png (888×1 px, 97 KB)

median: 732ms vs. 1.25s
75%: 819ms vs. 1.37s
95%: 986ms vs. 1.79s