Page MenuHomePhabricator

Train / Test wikidata damaging model
Closed, ResolvedPublic

Event Timeline

Damaging:

(p3)ladsgroup@ores-compute-01:~/editquality$ make models/wikidatawiki.damaging.gradient_boosting.model
cat datasets/wikidatawiki.features_damaging.20k_2016.tsv | cut -f2- | \
revscoring train_test \
        revscoring.scorer_models.GradientBoosting \
        editquality.feature_lists.wikidatawiki.damaging \
        --version=0.1.1 \
        -p 'max_depth=7' \
        -p 'learning_rate=0.01' \
        -p 'max_features="log2"' \
        -p 'n_estimators=700' \
        -s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.90)' -s 'filter_rate_at_recall(min_recall=0.75)' \
        --balance-sample-weight \
        --center --scale \
        --label-type=bool > \
models/wikidatawiki.damaging.gradient_boosting.model
2016-04-30 13:34:02,834 INFO:revscoring.utilities.train_test -- Training model...
2016-04-30 13:34:24,593 INFO:revscoring.utilities.train_test -- Testing model...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: max_depth=7, scale=true, max_features="log2", center=true, min_samples_leaf=1, min_weight_fraction_leaf=0.0, balanced_sample=false, learning_rate=0.01, warm_start=false, verbose=0, n_estimators=700, presort="auto", init=null, max_leaf_nodes=null, loss="deviance", subsample=1.0, balanced_sample_weight=true, min_samples_split=2, random_state=null
 - version: 0.1.1
 - trained: 2016-04-30T13:34:24.589664

Table:
                 ~False    ~True
        -----  --------  -------
        False      4237      132
        True         29      508

Accuracy: 0.967
Precision: 0.794
Recall: 0.946
PR-AUC: 0.885
ROC-AUC: 0.989
Recall @ 0.1 false-positive rate: threshold=0.967, recall=0.689, fpr=0.1
Filter rate @ 0.9 recall: threshold=0.807, filter_rate=0.886, recall=0.901
Filter rate @ 0.75 recall: threshold=0.962, filter_rate=0.908, recall=0.75

This kicks ass!

Good faith:

(p3)ladsgroup@ores-compute-01:~/editquality$ make models/wikidatawiki.goodfaith.gradient_boosting.model
cat datasets/wikidatawiki.features_goodfaith.20k_2016.tsv | cut -f2- | \
revscoring train_test \
        revscoring.scorer_models.GradientBoosting \
        editquality.feature_lists.wikidatawiki.goodfaith \
        --version=0.1.1 \
        -p 'max_depth=5' \
        -p 'learning_rate=0.1' \
        -p 'max_features="log2"' \
        -p 'n_estimators=300' \
        -s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.90)' -s 'filter_rate_at_recall(min_recall=0.75)' \
        --balance-sample-weight \
        --center --scale \
        --label-type=bool > \
models/wikidatawiki.goodfaith.gradient_boosting.model
2016-04-30 14:10:40,788 INFO:revscoring.utilities.train_test -- Training model...
2016-04-30 14:10:46,530 INFO:revscoring.utilities.train_test -- Testing model...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: balanced_sample=false, init=null, max_leaf_nodes=null, warm_start=false, min_weight_fraction_leaf=0.0, scale=true, center=true, max_features="log2", n_estimators=300, random_state=null, loss="deviance", learning_rate=0.1, min_samples_leaf=1, balanced_sample_weight=true, verbose=0, max_depth=5, presort="auto", subsample=1.0, min_samples_split=2
 - version: 0.1.1
 - trained: 2016-04-30T14:10:46.527508

Table:
                 ~False    ~True
        -----  --------  -------
        False       419       34
        True        228     4225

Accuracy: 0.947
Precision: 0.992
Recall: 0.949
PR-AUC: 0.998
ROC-AUC: 0.978
Recall @ 0.1 false-positive rate: threshold=0.007, recall=1.0, fpr=0.09
Filter rate @ 0.9 recall: threshold=0.874, filter_rate=0.181, recall=0.9
Filter rate @ 0.75 recall: threshold=0.997, filter_rate=0.319, recall=0.75