Page MenuHomePhabricator

Build edit quality models for plwiki
Closed, ResolvedPublic

Event Timeline

Damaging:

(p3)ladsgroup@ores-compute-01:~/editquality$ make models/plwiki.damaging.gradient_boosting.model
cat datasets/plwiki.features_damaging.20k_2016.tsv | cut -f2- | \
revscoring train_test \
	revscoring.scorer_models.GradientBoosting \
	editquality.feature_lists.plwiki.damaging \
	--version=0.1.1 \
	-p 'max_depth=5' \
	-p 'learning_rate=0.01' \
	-p 'max_features="log2"' \
	-p 'n_estimators=700' \
	-s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.90)' -s 'filter_rate_at_recall(min_recall=0.75)' \
	--balance-sample-weight \
	--center --scale \
	--label-type=bool > \
models/plwiki.damaging.gradient_boosting.model
2016-07-03 21:29:52,824 INFO:revscoring.utilities.train_test -- Training model...
2016-07-03 21:30:13,456 INFO:revscoring.utilities.train_test -- Testing model...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: max_leaf_nodes=null, min_samples_split=2, scale=true, balanced_sample=false, subsample=1.0, presort="auto", min_samples_leaf=1, verbose=0, min_weight_fraction_leaf=0.0, learning_rate=0.01, max_depth=5, balanced_sample_weight=true, init=null, max_features="log2", center=true, n_estimators=700, random_state=null, warm_start=false, loss="deviance"
 - version: 0.1.1
 - trained: 2016-07-03T21:30:13.453169

Table:
	         ~False    ~True
	-----  --------  -------
	False      4064      222
	True         20       52

Accuracy: 0.944
Precision: 0.19
Recall: 0.722
PR-AUC: 0.291
ROC-AUC: 0.931
Recall @ 0.1 false-positive rate: threshold=0.972, recall=0.014, fpr=0.0
Filter rate @ 0.9 recall: threshold=0.11, filter_rate=0.783, recall=0.903
Filter rate @ 0.75 recall: threshold=0.382, filter_rate=0.924, recall=0.75

Good faith:

(p3)ladsgroup@ores-compute-01:~/editquality$ make models/plwiki.goodfaith.gradient_boosting.model
cat datasets/plwiki.features_goodfaith.20k_2016.tsv | cut -f2- | \
revscoring train_test \
	revscoring.scorer_models.GradientBoosting \
	editquality.feature_lists.plwiki.goodfaith \
	--version=0.1.1 \
	-p 'max_depth=3' \
	-p 'learning_rate=0.01' \
	-p 'max_features="log2"' \
	-p 'n_estimators=700' \
	-s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.90)' -s 'filter_rate_at_recall(min_recall=0.75)' \
	--balance-sample-weight \
	--center --scale \
	--label-type=bool > \
models/plwiki.goodfaith.gradient_boosting.model
2016-07-04 02:46:35,358 INFO:revscoring.utilities.train_test -- Training model...
2016-07-04 02:46:47,488 INFO:revscoring.utilities.train_test -- Testing model...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: min_weight_fraction_leaf=0.0, max_depth=3, warm_start=false, n_estimators=700, init=null, random_state=null, max_features="log2", max_leaf_nodes=null, min_samples_split=2, min_samples_leaf=1, learning_rate=0.01, balanced_sample=false, balanced_sample_weight=true, scale=true, subsample=1.0, presort="auto", loss="deviance", center=true, verbose=0
 - version: 0.1.1
 - trained: 2016-07-04T02:46:47.484697

Table:
	         ~False    ~True
	-----  --------  -------
	False        26        3
	True        213     4116

Accuracy: 0.95
Precision: 0.999
Recall: 0.951
PR-AUC: 1.0
ROC-AUC: 0.979
Recall @ 0.1 false-positive rate: threshold=0.034, recall=1.0, fpr=0.007
Filter rate @ 0.9 recall: threshold=0.741, filter_rate=0.105, recall=0.9
Filter rate @ 0.75 recall: threshold=0.886, filter_rate=0.255, recall=0.75