Page MenuHomePhabricator

Investigate parallelizing the model makefile
Open, LowPublic

Description

It might be a win to parallelize the portions of the makefile bound by remote resources, such as feature extraction.

Event Timeline

awight created this task.Jun 26 2017, 9:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 26 2017, 9:44 PM
Halfak added a subscriber: Halfak.Jun 29 2017, 2:54 PM

Try running revscoring extract and checking top. It should parallelize. However there isn't parallelization for everything, so this task may still be valid.

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJun 29 2017, 2:55 PM
Halfak triaged this task as Low priority.Jun 29 2017, 2:55 PM
Halfak moved this task from Untriaged to Research & analysis on the Scoring-platform-team board.

I gave the --extractors argument to extraction and it's working nicely. We should make the N_CPUS a tunable variable in the makefile.

There still might be a small piece left, that any steps which are not cpu-bound can be running in parallel, e.g. pulling text extracts from the wikis. When we have to rebuild all models, those could be churning in the background, and we train models in serial as their datasets are ready.

Sumit added a subscriber: Sumit.Jul 24 2017, 4:38 PM
awight removed a subscriber: awight.Mar 21 2019, 4:01 PM