It's beautiful in its own way, but this 3 kloc [[ https://github.com/wiki-ai/editquality/blob/master/Makefile | makefile ]] is fragile and repetitive. It's hard to see what is variation and what is repeated. Large numbers of models with small variations are a good candidate for another level of automation.
Redesign how these builds happen. Some alternatives:
# Make is certainly nice to have in the stack because it handles dependencies well. Pure Make would be chafing however, and we would want to at least look into pattern rules and other Make tricks to reduce the amount of boilerplate required for each model.
# Code-generate the makefile, starting with declarative data about each model, rendered into make using templates.
# Directly execute the build tools under a thin custom workflow, reading from declarative configuration and logging actions.
The second option sounds most appealing to me.
Maybe we can also simplify some of the variations between models, i.e. maybe this is a case of updating stale boilerplate, and many models build well under default assumptions? Take an inventory of the variations.
Here are the make rules for the three models on fawiki,
```
############################# Persian Wikipedia ################################
datasets/fawiki.human_labeled_revisions.20k_2015.json:
./utility fetch_labels \
https://labels.wmflabs.org/campaigns/fawiki/6/ > \
datasets/fawiki.human_labeled_revisions.20k_2015.json
datasets/fawiki.labeled_revisions.20k_2015.json: \
datasets/fawiki.human_labeled_revisions.20k_2015.json
cat datasets/fawiki.human_labeled_revisions.20k_2015.json | \
./utility autolabel --host=https://fa.wikipedia.org \
--trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \
--trusted-edits=1000 \
--verbose > \
datasets/fawiki.labeled_revisions.20k_2015.json
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json: \
datasets/fawiki.labeled_revisions.20k_2015.json
cat datasets/fawiki.labeled_revisions.20k_2015.json | \
revscoring extract \
editquality.feature_lists.fawiki.reverted \
editquality.feature_lists.fawiki.damaging \
editquality.feature_lists.fawiki.goodfaith \
--host https://fa.wikipedia.org \
--verbose > \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
datasets/fawiki.sampled_revisions.2.20k_2015.json:
wget -qO- http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true > \
datasets/fawiki.sampled_revisions.2.20k_2015.json
datasets/fawiki.autolabeled_revisions.2.20k_2015.json: \
datasets/fawiki.sampled_revisions.2.20k_2015.json
cat datasets/fawiki.sampled_revisions.2.20k_2015.json | \
./utility autolabel --host=https://fa.wikipedia.org \
--trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \
--trusted-edits=1000 \
--verbose > \
datasets/fawiki.autolabeled_revisions.2.20k_2015.json
tuning_reports/fawiki.reverted.md: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring tune \
config/classifiers.params.yaml \
editquality.feature_lists.fawiki.reverted \
reverted_for_damage \
--cv-timeout=60 \
--debug > \
tuning_reports/fawiki.reverted.md
models/fawiki.reverted.gradient_boosting.model: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring cv_train \
revscoring.scorer_models.GradientBoosting \
editquality.feature_lists.fawiki.reverted \
reverted_for_damage \
--version=$(reverted_major_minor).0 \
-p 'max_depth=7' \
-p 'learning_rate=0.01' \
-p 'max_features="log2"' \
-p 'n_estimators=700' \
$(test_statistics) \
--balance-sample-weight \
--center --scale > \
models/fawiki.reverted.gradient_boosting.model
tuning_reports/fawiki.damaging.md: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring tune \
config/classifiers.params.yaml \
editquality.feature_lists.fawiki.damaging \
damaging \
--cv-timeout=60 \
--debug > \
tuning_reports/fawiki.damaging.md
models/fawiki.damaging.gradient_boosting.model: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring cv_train \
revscoring.scorer_models.GradientBoosting \
$(test_statistics) \
--balance-sample-weight \
--center --scale > \
models/fawiki.reverted.gradient_boosting.model
tuning_reports/fawiki.damaging.md: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring tune \
config/classifiers.params.yaml \
editquality.feature_lists.fawiki.damaging \
damaging \
--cv-timeout=60 \
--debug > \
tuning_reports/fawiki.damaging.md
models/fawiki.damaging.gradient_boosting.model: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring cv_train \
revscoring.scorer_models.GradientBoosting \
editquality.feature_lists.fawiki.damaging \
damaging \
--version=$(damaging_major_minor).0 \
-p 'max_depth=7' \
-p 'learning_rate=0.01' \
-p 'max_features="log2"' \
-p 'n_estimators=700' \
$(test_statistics) \
--balance-sample-weight \
--center --scale > \
models/fawiki.damaging.gradient_boosting.model
tuning_reports/fawiki.goodfaith.md: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring tune \
config/classifiers.params.yaml \
editquality.feature_lists.fawiki.goodfaith \
goodfaith \
--cv-timeout=60 \
--debug > \
tuning_reports/fawiki.goodfaith.md
models/fawiki.goodfaith.gradient_boosting.model: \
datasets/fawiki.labeled_revisions.w_cache.20k_2015.json
cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \
revscoring cv_train \
revscoring.scorer_models.GradientBoosting \
editquality.feature_lists.fawiki.goodfaith \
goodfaith \
--version=$(goodfaith_major_minor).0 \
-p 'max_depth=7' \
-p 'learning_rate=0.01' \
-p 'max_features="log2"' \
-p 'n_estimators=700' \
$(test_statistics) \
--balance-sample-weight \
--center --scale > \
models/fawiki.goodfaith.gradient_boosting.model
fawiki_models: \
models/fawiki.reverted.gradient_boosting.model \
models/fawiki.damaging.gradient_boosting.model \
models/fawiki.goodfaith.gradient_boosting.model
fawiki_tuning_reports: \
tuning_reports/fawiki.reverted.md \
tuning_reports/fawiki.damaging.md \
tuning_reports/fawiki.goodfaith.md
```
Compare against one potential declarative form:
```
- defaults:
# We cascade defaults, deferring to each wiki's configuration.
scorer_model: GradientBoosting
cv_train_params:
learning_rate: 0.01
max_depth: 5
max_features: log2
n_estimators: 700
trusted_edit_count: 1000
# FIXME: There's a lot I don't understand about how we're using "needs_review".
include_unreviewed: false
-
database: fawiki
models:
# Override one hyperparameter.
- defaults:
cv_train_params:
max_depth: 7
- reverted
- damaging
- label: goodfaith
# This is not really the case, but I wanted to show what further overrides would look like.
cv_train_params:
max_depth: 6
wikilabels_campaign:
sample: sample2.20k_2015
url: https://labels.wmflabs.org/campaigns/fawiki/6/
sampling_query:
# TODO: comment about this query, what is it and what does it do. Annoying that the output doesn't permalink to the input.
- name: sample2.20k_2015
url: http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true
trusted_groups:
- sysop
- oversight
- bot
- flow-bot
- rollbacker
- checkuser
- abusefilter
- bureaucrat
```