This task is done when there is a toy implementation of dependent tasks using celery.
|Resolved||None||T139408 [Epic] ORES refactor: Scoring structure|
|Resolved||Halfak||T134606 Score multiple models with the same cached dependencies|
|Resolved||Halfak||T136875 [Spike] Implement & test dependent tasks in Celery|
See repo here: https://github.com/halfak/dependent_task_testing
I created some test output here that demonstrates the whole system works together: https://gist.github.com/halfak/277975af5de6153719e1985d06210a23
I tested it with 4 requesters running in sequence (like precached) and 3 requesters running on shuffle (like random score requests) and found no deadlocking problems.
@Halfak, how is this going to share/get the extracted features between requests? It seems that this is just another caching area for results. I think I'm also against the idea of combining functionality into a super-task, instead of decomposing the different parts into their own, independent units (what I was trying to do with the data flow diagram).
The "supertask" is where the sharing takes place. Note that multiple models are applied in the "score_many_models".
I'm also against the idea of combining functionality into a super-task
Well, the computation isn't really a combined functionality, but rather the structure represents how computation can most easily flow (applying multiple models in sequence using the same cache).
Oh! I just realized that you might be imagining something different from me. I don't want to cache features between requests. Instead, I want to re-use the feature-extraction-cache to generate multiple scores for a single revision.
Instead, I want to re-use the feature-extraction-cache to generate multiple scores for a single revision.
I think a low-level object like this cache should be fully encapsulated within a task. I also think breaking up the tasks by units of work would help simplify the code base; the scoring processor seems to have a lot of responsibilities.
Because it would be very complicated to apply to the dependency injection system used to score revisions. We'd need to cache whole trees -- not just features. Lots of data would need to be stored. Further, it is an entirely separate task than the one that inspired this one.
breaking up the tasks by units of work would help simplify the code base
You keep saying this. This is the way that ORES currently works. We have a *problem* with the task units splitting data that ought not to be split. What, exactly, is the problem with performing all computations relevant to a chunk of data in a single task?
Which responsibilities do you think are inappropriate for ScoreProcessor?