Change Details

It's beautiful in its own way, but this 3 kloc [[ https://github.com/wiki-ai/editquality/blob/master/Makefile | makefile ]] is fragile and repetitive. It's hard to see what is variation and what is repeated. Redesign how these builds happen. Some alternatives, # Make is certainly nice because it handles dependencies for you, but we would need to take advantage of pattern rules and other Make tricks to reduce the amount of code required for each model. # Code-generate the makefile, starting with declarative data about each model, rendered into make using templates. # Directly execute the build tools within a custom workflow, reading from declarative configuration. The last two options sound most appealing to me, but I'll look into the make pattern rules as a first step towards generalization. Maybe we can also simplify some of the variations between models, i.e. maybe this is a case of updating stale boilerplate, and most models build well under default assumptions? Here are the make rules for the three models on fawiki, ``` ############################# Persian Wikipedia ################################ datasets/fawiki.human_labeled_revisions.20k_2015.json: ./utility fetch_labels \ https://labels.wmflabs.org/campaigns/fawiki/6/ > \ datasets/fawiki.human_labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.20k_2015.json: \ datasets/fawiki.human_labeled_revisions.20k_2015.json cat datasets/fawiki.human_labeled_revisions.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.w_cache.20k_2015.json: \ datasets/fawiki.labeled_revisions.20k_2015.json cat datasets/fawiki.labeled_revisions.20k_2015.json | \ revscoring extract \ editquality.feature_lists.fawiki.reverted \ editquality.feature_lists.fawiki.damaging \ editquality.feature_lists.fawiki.goodfaith \ --host https://fa.wikipedia.org \ --verbose > \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json datasets/fawiki.sampled_revisions.2.20k_2015.json: wget -qO- http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true > \ datasets/fawiki.sampled_revisions.2.20k_2015.json datasets/fawiki.autolabeled_revisions.2.20k_2015.json: \ datasets/fawiki.sampled_revisions.2.20k_2015.json cat datasets/fawiki.sampled_revisions.2.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.autolabeled_revisions.2.20k_2015.json tuning_reports/fawiki.reverted.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.reverted.md models/fawiki.reverted.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --version=$(reverted_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.damaging \ damaging \ --version=$(damaging_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.damaging.gradient_boosting.model tuning_reports/fawiki.goodfaith.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.goodfaith.md models/fawiki.goodfaith.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --version=$(goodfaith_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.goodfaith.gradient_boosting.model fawiki_models: \ models/fawiki.reverted.gradient_boosting.model \ models/fawiki.damaging.gradient_boosting.model \ models/fawiki.goodfaith.gradient_boosting.model fawiki_tuning_reports: \ tuning_reports/fawiki.reverted.md \ tuning_reports/fawiki.damaging.md \ tuning_reports/fawiki.goodfaith.md ``` Compare against one potential declarative form: ``` - # We cascade these default values with each database's configuration. database: default scorer_model: GradientBoosting cv_train_params: learning_rate: 0.01 max_depth: 7 max_features: log2 n_estimators: 700 # FIXME: There's a lot I don't understand about how we're using "needs_review". include_unreviewed: false - database: fawiki models: - reverted - damaging - goodfaith # TODO: revscoring cv_train params vary slightly. Do we want to preserve that? wikilabels_campaign: https://labels.wmflabs.org/campaigns/fawiki/6/ sampling_query: # TODO: comment about this query, what is it and what does it do. Annoying that the output doesn't permalink to the input. name: 20k_2015 url: http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true trusted_groups: - sysop - oversight - bot - flow-bot - rollbacker - checkuser - abusefilter - bureaucrat ```

It's beautiful in its own way, but this 3 kloc [[ https://github.com/wiki-ai/editquality/blob/master/Makefile | makefile ]] is fragile and repetitive. It's hard to see what is variation and what is repeated. Large numbers of models with small variations are a good candidate for another level of automation. Redesign how these builds happen. Some alternatives: # Make is certainly nice to have in the stack because it handles dependencies well. Pure Make would be chafing however, and we would want to at least look into pattern rules and other Make tricks to reduce the amount of boilerplate required for each model. # Code-generate the makefile, starting with declarative data about each model, rendered into make using templates. # Directly execute the build tools under a thin custom workflow, reading from declarative configuration and logging actions. The second option sounds most appealing to me. Maybe we can also simplify some of the variations between models, i.e. maybe this is a case of updating stale boilerplate, and many models build well under default assumptions? Take an inventory of the variations. Here are the make rules for the three models on fawiki, ``` ############################# Persian Wikipedia ################################ datasets/fawiki.human_labeled_revisions.20k_2015.json: ./utility fetch_labels \ https://labels.wmflabs.org/campaigns/fawiki/6/ > \ datasets/fawiki.human_labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.20k_2015.json: \ datasets/fawiki.human_labeled_revisions.20k_2015.json cat datasets/fawiki.human_labeled_revisions.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.w_cache.20k_2015.json: \ datasets/fawiki.labeled_revisions.20k_2015.json cat datasets/fawiki.labeled_revisions.20k_2015.json | \ revscoring extract \ editquality.feature_lists.fawiki.reverted \ editquality.feature_lists.fawiki.damaging \ editquality.feature_lists.fawiki.goodfaith \ --host https://fa.wikipedia.org \ --verbose > \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json datasets/fawiki.sampled_revisions.2.20k_2015.json: wget -qO- http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true > \ datasets/fawiki.sampled_revisions.2.20k_2015.json datasets/fawiki.autolabeled_revisions.2.20k_2015.json: \ datasets/fawiki.sampled_revisions.2.20k_2015.json cat datasets/fawiki.sampled_revisions.2.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.autolabeled_revisions.2.20k_2015.json tuning_reports/fawiki.reverted.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.reverted.md models/fawiki.reverted.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --version=$(reverted_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.damaging \ damaging \ --version=$(damaging_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.damaging.gradient_boosting.model tuning_reports/fawiki.goodfaith.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.goodfaith.md models/fawiki.goodfaith.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --version=$(goodfaith_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.goodfaith.gradient_boosting.model fawiki_models: \ models/fawiki.reverted.gradient_boosting.model \ models/fawiki.damaging.gradient_boosting.model \ models/fawiki.goodfaith.gradient_boosting.model fawiki_tuning_reports: \ tuning_reports/fawiki.reverted.md \ tuning_reports/fawiki.damaging.md \ tuning_reports/fawiki.goodfaith.md ``` Compare against one potential declarative form: ``` - # We cascade defaults, deferring to each wiki's configuration. database: default scorer_model: GradientBoosting cv_train_params: learning_rate: 0.01 max_depth: 5 max_features: log2 n_estimators: 700 trusted_edit_count: 1000 # FIXME: There's a lot I don't understand about how we're using "needs_review". include_unreviewed: false - database: fawiki models: - label: reverted # Override one hyperparameter. We could also have a fawiki.default config node :-/ cv_train_params: max_depth: 7 - label: damaging cv_train_params: max_depth: 7 - label: goodfaith cv_train_params: max_depth: 7 wikilabels_campaign: https://labels.wmflabs.org/campaigns/fawiki/6/ sampling_query: # TODO: comment about this query, what is it and what does it do. Annoying that the output doesn't permalink to the input. - name: sample2.20k_2015 url: http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true trusted_groups: - sysop - oversight - bot - flow-bot - rollbacker - checkuser - abusefilter - bureaucrat ```

It's beautiful in its own way, but this 3 kloc [[ https://github.com/wiki-ai/editquality/blob/master/Makefile | makefile ]] is fragile and repetitive. It's hard to see what is variation and what is repeated. Large numbers of models with small variations are a good candidate for another level of automation. Redesign how these builds happen. Some alternatives,: # Make is certainly nice to have in the stack because it handles dependencies for youwell. Pure Make would be chafing however, butand we would needwant to take advantage ofat least look into pattern rules and other Make tricks to reduce the amount of codeboilerplate required for each model. # Code-generate the makefile, starting with declarative data about each model, rendered into make using templates. # Directly execute the build tools wiunder a thin a custom workflow, reading from declarative configuration and logging actions. The last twosecond options sounds most appealing to me, but I'll look into the make pattern rules as a first step towards generalization. to me. Maybe we can also simplify some of the variations between models, i.e. maybe this is a case of updating stale boilerplate, and mostany models build well under default assumptions? Take an inventory of the variations. Here are the make rules for the three models on fawiki, ``` ############################# Persian Wikipedia ################################ datasets/fawiki.human_labeled_revisions.20k_2015.json: ./utility fetch_labels \ https://labels.wmflabs.org/campaigns/fawiki/6/ > \ datasets/fawiki.human_labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.20k_2015.json: \ datasets/fawiki.human_labeled_revisions.20k_2015.json cat datasets/fawiki.human_labeled_revisions.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.labeled_revisions.20k_2015.json datasets/fawiki.labeled_revisions.w_cache.20k_2015.json: \ datasets/fawiki.labeled_revisions.20k_2015.json cat datasets/fawiki.labeled_revisions.20k_2015.json | \ revscoring extract \ editquality.feature_lists.fawiki.reverted \ editquality.feature_lists.fawiki.damaging \ editquality.feature_lists.fawiki.goodfaith \ --host https://fa.wikipedia.org \ --verbose > \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json datasets/fawiki.sampled_revisions.2.20k_2015.json: wget -qO- http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true > \ datasets/fawiki.sampled_revisions.2.20k_2015.json datasets/fawiki.autolabeled_revisions.2.20k_2015.json: \ datasets/fawiki.sampled_revisions.2.20k_2015.json cat datasets/fawiki.sampled_revisions.2.20k_2015.json | \ ./utility autolabel --host=https://fa.wikipedia.org \ --trusted-groups=sysop,oversight,bot,rollbacker,checkuser,abusefilter,bureaucrat,flow-bot \ --trusted-edits=1000 \ --verbose > \ datasets/fawiki.autolabeled_revisions.2.20k_2015.json tuning_reports/fawiki.reverted.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.reverted.md models/fawiki.reverted.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.reverted \ reverted_for_damage \ --version=$(reverted_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.reverted.gradient_boosting.model tuning_reports/fawiki.damaging.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.damaging \ damaging \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.damaging.md models/fawiki.damaging.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.damaging \ damaging \ --version=$(damaging_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.damaging.gradient_boosting.model tuning_reports/fawiki.goodfaith.md: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring tune \ config/classifiers.params.yaml \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --cv-timeout=60 \ --debug > \ tuning_reports/fawiki.goodfaith.md models/fawiki.goodfaith.gradient_boosting.model: \ datasets/fawiki.labeled_revisions.w_cache.20k_2015.json cat datasets/fawiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scorer_models.GradientBoosting \ editquality.feature_lists.fawiki.goodfaith \ goodfaith \ --version=$(goodfaith_major_minor).0 \ -p 'max_depth=7' \ -p 'learning_rate=0.01' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ $(test_statistics) \ --balance-sample-weight \ --center --scale > \ models/fawiki.goodfaith.gradient_boosting.model fawiki_models: \ models/fawiki.reverted.gradient_boosting.model \ models/fawiki.damaging.gradient_boosting.model \ models/fawiki.goodfaith.gradient_boosting.model fawiki_tuning_reports: \ tuning_reports/fawiki.reverted.md \ tuning_reports/fawiki.damaging.md \ tuning_reports/fawiki.goodfaith.md ``` Compare against one potential declarative form: ``` - # We cascade these default values withs, deferring to each databasewiki's configuration. database: default scorer_model: GradientBoosting cv_train_params: learning_rate: 0.01 max_depth: 75 max_features: log2 n_estimators: 700 trusted_edit_count: 1000 # FIXME: There's a lot I don't understand about how we're using "needs_review". include_unreviewed: false - database: fawiki models: - revertedlabel: reverted # Override one hyperparameter. We could also have a fawiki.default config node :-/ cv_train_params: max_depth: 7 -- label: damaging cv_train_params: max_depth: 7 - label: goodfaith # TODO: revscoring cv_train _params vary slightly. Do we want to preserve that?: max_depth: 7 wikilabels_campaign: https://labels.wmflabs.org/campaigns/fawiki/6/ sampling_query: # TODO: comment about this query, what is it and what does it do. Annoying that the output doesn't permalink to the input. - name: sample2.20k_2015 url: http://quarry.wmflabs.org/run/59580/output/0/json-lines?download=true trusted_groups: - sysop - oversight - bot - flow-bot - rollbacker - checkuser - abusefilter - bureaucrat ```