Page MenuHomePhabricator

Investigate parallelizing the model makefile
Open, LowPublic

Description

It might be a win to parallelize the portions of the makefile bound by remote resources, such as feature extraction.

Event Timeline

Try running revscoring extract and checking top. It should parallelize. However there isn't parallelization for everything, so this task may still be valid.

Halfak moved this task from Unsorted to Research & analysis on the Machine-Learning-Team board.

I gave the --extractors argument to extraction and it's working nicely. We should make the N_CPUS a tunable variable in the makefile.

There still might be a small piece left, that any steps which are not cpu-bound can be running in parallel, e.g. pulling text extracts from the wikis. When we have to rebuild all models, those could be churning in the background, and we train models in serial as their datasets are ready.