Page MenuHomePhabricator

Experiment with disabling dynamic task mapping
Closed, ResolvedPublic

Description

We'd like to know if the scheduler can process the dumps v1 DAGs faster without dynamic task mapping (we iterate over the list of fetched wikis and create tasks at runtime). Let's try to hardcode the list of wikis in the code and do without dynamic task mapping, to see whether we witness performance improvements.

Event Timeline

brouberol triaged this task as Medium priority.

Disabling task mapping seems to have had quite the effect on the scheduler loop time (how much time it takes the scheduler to perform a "tick" of work).

Screenshot 2025-04-11 at 14.26.42.png (612×1 px, 181 KB)

left: with dynamic task mapping, right: without

The tradeoff here is that the scheduler must perform more work at DAG processing time (as showcased in the next screenshot), but I think this is totally acceptable, as once development work on that DAG settles down, we could configure the scheduler to re-process the DAG every, say, 10 minutes, instead of the default 30s.

Screenshot 2025-04-11 at 14.30.19.png (548×988 px, 168 KB)

One added bonus is that the DAG grid representation is much more convenient: each wiki has its associated foldable TaskGroup, which makes it very easy to follow per-wiki progress over time.

Screenshot 2025-04-11 at 14.20.56.png (1×770 px, 129 KB)
Screenshot 2025-04-11 at 14.21.06.png (1×944 px, 277 KB)