It's beautiful in its own way, but this 3 kloc makefile is fragile and repetitive. It's hard to see what is variation and what is repeated. Large numbers of models with small variations are a good candidate for another level of automation.
Redesign how these builds happen. Some alternatives:
- Make is certainly nice to have in the stack because it handles dependencies well. Pure Make would be chafing however, and we would want to at least look into pattern rules and other Make tricks to reduce the amount of boilerplate required for each model.
- Code-generate the makefile, starting with declarative data about each model, rendered into make using templates.
- Directly execute the build tools under a thin custom workflow, reading from declarative configuration and logging actions.
The second option sounds most appealing to me.
Maybe we can also simplify some of the variations between models, i.e. maybe this is a case of updating stale boilerplate, and many models build well under default assumptions? Take an inventory of the variations.
Here are the make rules for the three models on fawiki,
############################# Persian Wikipedia ################################ datasets/fawiki.human_labeled_revisions.20k_2015.json: ./utility fetch_labels \ https://labels.wmflabs.org/campaigns/fawiki/6/ > \ datasets/fawiki.human_labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.20k_2015.json: \ datasets/fawiki.human_labeled_revisions.20k_2015.json cat datasets/fawiki.human_labeled_revisions.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.w_cache.20k_2015.json: \ datasets/fawiki.labeled_revisions.20k_2015.json cat datasets/fawiki.labeled_revisions.20k_2015.json | \ revscoring extract \ editquality.feature_lists.fawiki.reverted \ editquality.feature_lists.fawiki.damaging \ editquality.feature_lists.fawiki.goodfaith \ --host https://fa.wikipedia.org \ --verbose > \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json datasets/fawiki.sampled_revisions.2.20k_2015.json: wget -qO- http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true > \ datasets/fawiki.sampled_revisions.2.20k_2015.json datasets/fawiki.autolabeled_revisions.2.20k_2015.json: \ datasets/fawiki.sampled_revisions.2.20k_2015.json cat datasets/fawiki.sampled_revisions.2.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.autolabeled_revisions.2.20k_2015.json tuning_reports/fawiki.reverted.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.reverted.md models/fawiki.reverted.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --version=$(reverted_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.damaging \ damaging \ --version=$(damaging_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.damaging.gradient_boosting.model tuning_reports/fawiki.goodfaith.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.goodfaith.md models/fawiki.goodfaith.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --version=$(goodfaith_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.goodfaith.gradient_boosting.model fawiki_models: \ models/fawiki.reverted.gradient_boosting.model \ models/fawiki.damaging.gradient_boosting.model \ models/fawiki.goodfaith.gradient_boosting.model fawiki_tuning_reports: \ tuning_reports/fawiki.reverted.md \ tuning_reports/fawiki.damaging.md \ tuning_reports/fawiki.goodfaith.md
Compare against one potential declarative form:
- defaults: # We cascade defaults, deferring to each wiki's configuration. scorer_model: GradientBoosting cv_train_params: learning_rate: 0.01 max_depth: 5 max_features: log2 n_estimators: 700 trusted_edit_count: 1000 # FIXME: There's a lot I don't understand about how we're using "needs_review". include_unreviewed: false - database: fawiki models: # Override one hyperparameter. - defaults: cv_train_params: max_depth: 7 - reverted - damaging - label: goodfaith # This is not really the case, but I wanted to show what further overrides would look like. cv_train_params: max_depth: 6 wikilabels_campaign: sample: sample2.20k_2015 url: https://labels.wmflabs.org/campaigns/fawiki/6/ sampling_query: # TODO: comment about this query, what is it and what does it do. Annoying that the output doesn't permalink to the input. - name: sample2.20k_2015 url: http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true trusted_groups: - sysop - oversight - bot - flow-bot - rollbacker - checkuser - abusefilter - bureaucrat