Train on all data, Report test statistics on cross-validation
Closed, ResolvedPublic

Description

Currently, we withhold a test set when training each of the models. This is usually about 20% of the data.

Instead of randomly sampling and withholding a single test set, why not instead perform a cross-validation with folds on the entire dataset (let's say 5 or 10 folds) and then train the model on the entire dataset.

This way, we'd (1) get a more stable set of test statistics -- since they'd represent a mean value among many folds of train/test and (2) be able to train our models on all available data -- which would improve overall fitness in practice.

Halfak created this task.Aug 14 2016, 9:42 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 14 2016, 9:42 PM
Halfak renamed this task from Train on all data, Report test statistics on CV to Train on all data, Report test statistics on cross-validation.Aug 18 2016, 2:13 PM
Halfak triaged this task as Normal priority.Aug 18 2016, 2:18 PM
$ head -n 5648 datasets/enwiki.observations.damaging_w_cache.20k_2015.json | ./utility cv_train revscoring.scorer_models.GradientBoosting editquality.feature_lists.enwiki.damaging damaging --version=test --balance-sample-weight --center --scale --debug > models/enwiki.damaging.gradient_boosting.model
2016-09-14 17:58:46,613 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2016-09-14 17:58:46,614 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2016-09-14 17:58:55,375 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2016-09-14 17:59:03,485 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2016-09-14 17:59:10,341 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2016-09-14 17:59:16,276 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2016-09-14 17:59:21,927 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2016-09-14 17:59:29,469 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2016-09-14 17:59:37,338 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2016-09-14 17:59:45,247 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2016-09-14 17:59:53,269 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2016-09-14 17:59:58,666 DEBUG:revscoring.scorer_models.test_statistics.roc -- interp_auc=0.8824357710722381, individual_auc=[0.85990825688073391, 0.83730955968525034, 0.850802356286817, 0.85022120736406459, 0.88432835820895528, 0.90773512202083628, 0.86359404096834258, 0.92275229357798172, 0.90523191094619659, 0.8809306569343065]
2016-09-14 17:59:58,666 DEBUG:revscoring.scorer_models.test_statistics.roc -- interp_auc=0.8834149953243474, individual_auc=[0.85990825688073391, 0.83730955968525034, 0.850802356286817, 0.85022120736406459, 0.88432835820895528, 0.90773512202083628, 0.86359404096834258, 0.92275229357798161, 0.90523191094619659, 0.88093065693430672]
2016-09-14 17:59:58,671 DEBUG:revscoring.scorer_models.test_statistics.precision_recall -- interp_auc=0.9920617472138642, individual_auc=[0.99269546605462566, 0.98609218451342695, 0.99393002960738885, 0.99079179188769506, 0.9920661258445499, 0.99467613325338378, 0.9865275026025937, 0.99686321212442008, 0.99488684184299703, 0.99490834573614761]
2016-09-14 17:59:58,672 DEBUG:revscoring.scorer_models.test_statistics.precision_recall -- interp_auc=0.36770735614283157, individual_auc=[0.31075429820933481, 0.2849956251686917, 0.29011919368982519, 0.37191857409654483, 0.46093496672531353, 0.36398420932938902, 0.37450276084564116, 0.40324916638948166, 0.3311185943683646, 0.39725372462352282]
2016-09-14 17:59:58,674 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: init=null, balanced_sample=false, n_estimators=100, loss="deviance", balanced_sample_weight=true, random_state=null, scale=true, max_features=null, verbose=0, min_weight_fraction_leaf=0.0, subsample=1.0, max_depth=3, warm_start=false, max_leaf_nodes=null, center=true, presort="auto", min_samples_split=2, min_samples_leaf=1, learning_rate=0.1
 - version: test
 - trained: 2016-09-14T18:00:04.258858

Table:
	         ~False    ~True
	-----  --------  -------
	False      4663      755
	True         65      165

Accuracy: 0.855
Precision:
	-----  -----
	False  0.986
	True   0.181
	-----  -----

Recall:
	-----  -----
	False  0.861
	True   0.716
	-----  -----

ROC-AUC:
	-----  -----
	False  0.882
	True   0.883
	-----  -----

PR-AUC:
	-----  -----
	False  0.992
	True   0.368
Halfak closed this task as Resolved.Sep 22 2016, 5:51 PM