Page MenuHomePhabricator

Train on all data, Report test statistics on cross-validation
Closed, ResolvedPublic

Description

Currently, we withhold a test set when training each of the models. This is usually about 20% of the data.

Instead of randomly sampling and withholding a single test set, why not instead perform a cross-validation with folds on the entire dataset (let's say 5 or 10 folds) and then train the model on the entire dataset.

This way, we'd (1) get a more stable set of test statistics -- since they'd represent a mean value among many folds of train/test and (2) be able to train our models on all available data -- which would improve overall fitness in practice.

Event Timeline

Halfak renamed this task from Train on all data, Report test statistics on CV to Train on all data, Report test statistics on cross-validation.Aug 18 2016, 2:13 PM
Halfak triaged this task as Medium priority.Aug 18 2016, 2:18 PM
$ head -n 5648 datasets/enwiki.observations.damaging_w_cache.20k_2015.json | ./utility cv_train revscoring.scorer_models.GradientBoosting editquality.feature_lists.enwiki.damaging damaging --version=test --balance-sample-weight --center --scale --debug > models/enwiki.damaging.gradient_boosting.model
2016-09-14 17:58:46,613 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2016-09-14 17:58:46,614 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2016-09-14 17:58:55,375 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2016-09-14 17:59:03,485 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2016-09-14 17:59:10,341 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2016-09-14 17:59:16,276 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2016-09-14 17:59:21,927 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2016-09-14 17:59:29,469 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2016-09-14 17:59:37,338 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2016-09-14 17:59:45,247 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2016-09-14 17:59:53,269 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2016-09-14 17:59:58,666 DEBUG:revscoring.scorer_models.test_statistics.roc -- interp_auc=0.8824357710722381, individual_auc=[0.85990825688073391, 0.83730955968525034, 0.850802356286817, 0.85022120736406459, 0.88432835820895528, 0.90773512202083628, 0.86359404096834258, 0.92275229357798172, 0.90523191094619659, 0.8809306569343065]
2016-09-14 17:59:58,666 DEBUG:revscoring.scorer_models.test_statistics.roc -- interp_auc=0.8834149953243474, individual_auc=[0.85990825688073391, 0.83730955968525034, 0.850802356286817, 0.85022120736406459, 0.88432835820895528, 0.90773512202083628, 0.86359404096834258, 0.92275229357798161, 0.90523191094619659, 0.88093065693430672]
2016-09-14 17:59:58,671 DEBUG:revscoring.scorer_models.test_statistics.precision_recall -- interp_auc=0.9920617472138642, individual_auc=[0.99269546605462566, 0.98609218451342695, 0.99393002960738885, 0.99079179188769506, 0.9920661258445499, 0.99467613325338378, 0.9865275026025937, 0.99686321212442008, 0.99488684184299703, 0.99490834573614761]
2016-09-14 17:59:58,672 DEBUG:revscoring.scorer_models.test_statistics.precision_recall -- interp_auc=0.36770735614283157, individual_auc=[0.31075429820933481, 0.2849956251686917, 0.29011919368982519, 0.37191857409654483, 0.46093496672531353, 0.36398420932938902, 0.37450276084564116, 0.40324916638948166, 0.3311185943683646, 0.39725372462352282]
2016-09-14 17:59:58,674 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: init=null, balanced_sample=false, n_estimators=100, loss="deviance", balanced_sample_weight=true, random_state=null, scale=true, max_features=null, verbose=0, min_weight_fraction_leaf=0.0, subsample=1.0, max_depth=3, warm_start=false, max_leaf_nodes=null, center=true, presort="auto", min_samples_split=2, min_samples_leaf=1, learning_rate=0.1
 - version: test
 - trained: 2016-09-14T18:00:04.258858

Table:
	         ~False    ~True
	-----  --------  -------
	False      4663      755
	True         65      165

Accuracy: 0.855
Precision:
	-----  -----
	False  0.986
	True   0.181
	-----  -----

Recall:
	-----  -----
	False  0.861
	True   0.716
	-----  -----

ROC-AUC:
	-----  -----
	False  0.882
	True   0.883
	-----  -----

PR-AUC:
	-----  -----
	False  0.992
	True   0.368