Page MenuHomePhabricator

Train/test damaging and goodfaith models for frwiki
Closed, ResolvedPublic

Event Timeline

Halfak created this task.Mar 17 2016, 8:54 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 17 2016, 8:54 PM
Halfak triaged this task as Lowest priority.Aug 4 2016, 2:22 PM
Halfak renamed this task from Deploy edit quality models for frwiki to Train/test damaging and goodfaith models for frwiki.May 11 2017, 2:30 PM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMay 12 2017, 8:55 PM

Damaging:

(p3)ladsgroup@ores-compute-01:~/editquality$ make models/frwiki.damaging.gradient_boosting.model
cat datasets/frwiki.labeled_revisions.w_cache.20k_2016.json | \
revscoring cv_train \
	revscoring.scorer_models.GradientBoosting \
	editquality.feature_lists.frwiki.damaging \
	damaging \
	--version=0.3.0 \
	-p 'max_depth=7' \
	-p 'learning_rate=0.01' \
	-p 'max_features="log2"' \
	-p 'n_estimators=300' \
	-s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.9)' -s 'filter_rate_at_recall(min_recall=0.75)' -s 'recall_at_precision(min_precision=0.995)' -s 'recall_at_precision(min_precision=0.99)' -s 'recall_at_precision(min_precision=0.98)' -s 'recall_at_precision(min_precision=0.90)' -s 'recall_at_precision(min_precision=0.75)' -s 'recall_at_precision(min_precision=0.60)' -s 'recall_at_precision(min_precision=0.45)' -s 'recall_at_precision(min_precision=0.15)' \
	--balance-sample-weight \
	--center --scale > \
models/frwiki.damaging.gradient_boosting.model
2017-05-13 00:24:31,187 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2017-05-13 00:24:31,986 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2017-05-13 00:24:32,090 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2017-05-13 00:24:32,293 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2017-05-13 00:24:32,460 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2017-05-13 00:24:32,647 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2017-05-13 00:24:32,845 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2017-05-13 00:24:33,090 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2017-05-13 00:24:33,228 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2017-05-13 00:28:25,285 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2017-05-13 00:28:26,134 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2017-05-13 00:31:43,422 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: random_state=null, max_leaf_nodes=null, init=null, max_features="log2", subsample=1.0, warm_start=false, min_samples_leaf=1, min_samples_split=2, presort="auto", learning_rate=0.01, min_weight_fraction_leaf=0.0, scale=true, loss="deviance", verbose=0, max_depth=7, balanced_sample_weight=true, center=true, n_estimators=300, balanced_sample=false
 - version: 0.3.0
 - trained: 2017-05-13T00:32:11.430568

Table:
	         ~False    ~True
	-----  --------  -------
	False     17316     1959
	True        181      379

Accuracy: 0.892
Precision:
	-----  -----
	False  0.99
	True   0.162
	-----  -----

Recall:
	-----  -----
	False  0.898
	True   0.677
	-----  -----

PR-AUC:
	-----  -----
	False  0.994
	True   0.273
	-----  -----

ROC-AUC:
	-----  -----
	False  0.883
	True   0.883
	-----  -----

Recall @ 0.1 false-positive rate:
	label      threshold    recall    fpr
	-------  -----------  --------  -----
	False          0.823     0.627  0.091
	True           0.515     0.682  0.098

Filter rate @ 0.9 recall:
	label      threshold    filter_rate    recall
	-------  -----------  -------------  --------
	False          0.494          0.116     0.9
	True           0.177          0.612     0.909

Filter rate @ 0.75 recall:
	label      threshold    filter_rate    recall
	-------  -----------  -------------  --------
	False          0.773          0.267     0.75
	True           0.392          0.843     0.757

Recall @ 0.995 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.784     0.672        0.995
	True           0.911     0.043        1

Recall @ 0.99 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.487     0.9           0.99
	True           0.911     0.043         1

Recall @ 0.98 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.2       0.975         0.98
	True           0.911     0.043         1

Recall @ 0.9 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.083     1            0.972
	True           0.911     0.043        1

Recall @ 0.75 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.083     1            0.972
	True           0.902     0.058        0.875

Recall @ 0.6 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.083     1            0.972
	True           0.88      0.116        0.641

Recall @ 0.45 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.083     1            0.972
	True           0.87      0.152        0.531

Recall @ 0.15 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.083     1            0.972
	True           0.442     0.728        0.151

Goodfaith:

(p3)ladsgroup@ores-compute-01:~/editquality$ make models/frwiki.goodfaith.gradient_boosting.model 
cat datasets/frwiki.labeled_revisions.w_cache.20k_2016.json | \
revscoring cv_train \
	revscoring.scorer_models.GradientBoosting \
	editquality.feature_lists.frwiki.goodfaith \
	goodfaith \
	--version=0.3.0 \
	-p 'max_depth=5' \
	-p 'learning_rate=0.01' \
	-p 'max_features="log2"' \
	-p 'n_estimators=500' \
	-s 'table' -s 'accuracy' -s 'precision' -s 'recall' -s 'pr' -s 'roc' -s 'recall_at_fpr(max_fpr=0.10)' -s 'filter_rate_at_recall(min_recall=0.9)' -s 'filter_rate_at_recall(min_recall=0.75)' -s 'recall_at_precision(min_precision=0.995)' -s 'recall_at_precision(min_precision=0.99)' -s 'recall_at_precision(min_precision=0.98)' -s 'recall_at_precision(min_precision=0.90)' -s 'recall_at_precision(min_precision=0.75)' -s 'recall_at_precision(min_precision=0.60)' -s 'recall_at_precision(min_precision=0.45)' -s 'recall_at_precision(min_precision=0.15)' \
	--balance-sample-weight \
	--center --scale > \
models/frwiki.goodfaith.gradient_boosting.model
2017-05-13 01:03:06,734 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2017-05-13 01:03:07,461 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2017-05-13 01:03:07,551 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2017-05-13 01:03:07,766 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2017-05-13 01:03:07,967 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2017-05-13 01:03:08,247 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2017-05-13 01:03:08,387 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2017-05-13 01:03:08,846 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2017-05-13 01:03:08,854 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2017-05-13 01:06:42,768 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2017-05-13 01:06:43,196 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2017-05-13 01:09:37,452 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: max_depth=5, warm_start=false, n_estimators=500, balanced_sample=false, min_weight_fraction_leaf=0.0, subsample=1.0, min_samples_split=2, center=true, init=null, scale=true, presort="auto", learning_rate=0.01, loss="deviance", max_leaf_nodes=null, min_samples_leaf=1, max_features="log2", random_state=null, verbose=0, balanced_sample_weight=true
 - version: 0.3.0
 - trained: 2017-05-13T01:10:05.639283

Table:
	         ~False    ~True
	-----  --------  -------
	False       276      120
	True       2006    17433

Accuracy: 0.893
Precision:
	-----  -----
	False  0.121
	True   0.993
	-----  -----

Recall:
	-----  -----
	False  0.697
	True   0.897
	-----  -----

PR-AUC:
	-----  -----
	False  0.232
	True   0.994
	-----  -----

ROC-AUC:
	-----  -----
	False  0.885
	True   0.884
	-----  -----

Recall @ 0.1 false-positive rate:
	label      threshold    recall    fpr
	-------  -----------  --------  -----
	False          0.554     0.693  0.094
	True           0.817     0.62   0.088

Filter rate @ 0.9 recall:
	label      threshold    filter_rate    recall
	-------  -----------  -------------  --------
	False          0.183          0.609     0.912
	True           0.479          0.112     0.9

Filter rate @ 0.75 recall:
	label      threshold    filter_rate    recall
	-------  -----------  -------------  --------
	False          0.418          0.842     0.758
	True           0.748          0.261     0.75

Recall @ 0.995 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.941     0.03         1
	True           0.664     0.783        0.995

Recall @ 0.99 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.941      0.03         1
	True           0.25       0.95         0.99

Recall @ 0.98 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.941     0.03         1
	True           0.077     0.999        0.981

Recall @ 0.9 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.941      0.03         1
	True           0.062      1            0.98

Recall @ 0.75 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.937     0.052        0.938
	True           0.062     1            0.98

Recall @ 0.6 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.915     0.093        0.714
	True           0.062     1            0.98

Recall @ 0.45 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.902     0.138        0.538
	True           0.062     1            0.98

Recall @ 0.15 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.691     0.595        0.155
	True           0.062     1            0.98
Halfak closed this task as Resolved.Jun 5 2017, 5:07 PM
Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJun 5 2017, 5:07 PM
Restricted Application added a project: artificial-intelligence. · View Herald Transcript