Page MenuHomePhabricator

Investigate small loss in fitness with the new data in fawiki
Closed, ResolvedPublic

Event Timeline

Halfak renamed this task from Investigate small loss in accuracy with the new data in fawiki to Investigate small loss in fitness with the new data in fawiki.Jul 22 2017, 5:41 PM

FYI, for old data and new revscoring version, this is result of damaging:

ScikitLearnClassifier
 - type: GradientBoosting
 - params: verbose=0, min_weight_fraction_leaf=0.0, max_leaf_nodes=null, init=null, loss="deviance", min_samples_leaf=1, learning_rate=0.1, n_estimators=300, criterion="friedman_mse", subsample=1.0, min_samples_split=2, max_features="log2", balanced_sample=false, balanced_sample_weight=true, presort="auto", warm_start=false, max_depth=3, min_impurity_split=1e-07, random_state=null, scale=true, center=true
 - version: 0.3.0
 - trained: 2017-07-24T18:57:21.920621

Table:
	         ~False    ~True
	-----  --------  -------
	False     18408      927
	True         60      182

Accuracy: 0.95
Precision:
	-----  -----
	False  0.997
	True   0.164
	-----  -----

Recall:
	-----  -----
	False  0.952
	True   0.753
	-----  -----

PR-AUC:
	-----  -----
	False  0.995
	True   0.262
	-----  -----

ROC-AUC:
	-----  -----
	False  0.964
	True   0.974
	-----  -----

Recall @ 0.1 false-positive rate:
	label      threshold    recall    fpr
	-------  -----------  --------  -----
	False          0.738     0.935  0.084
	True           0.079     0.961  0.086

Filter rate @ 0.9 recall:
	label      threshold    filter_rate    recall
	-------  -----------  -------------  --------
	False          0.961          0.11      0.9
	True           0.266          0.924     0.916

Filter rate @ 0.75 recall:
	label      threshold    filter_rate    recall
	-------  -----------  -------------  --------
	False          0.995          0.259     0.751
	True           0.505          0.945     0.76

Recall @ 0.995 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.314     0.967        0.995
	True           0.987     0.054        1

Recall @ 0.99 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.058     0.994         0.99
	True           0.987     0.054         1

Recall @ 0.98 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.014     1            0.988
	True           0.987     0.054        1

Recall @ 0.9 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.014     1            0.988
	True           0.987     0.054        1

Recall @ 0.75 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.014     1            0.988
	True           0.984     0.071        0.958

Recall @ 0.6 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.014     1            0.988
	True           0.975     0.128        0.744

Recall @ 0.45 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.014     1            0.988
	True           0.95      0.246        0.489

Recall @ 0.15 precision:
	label      threshold    recall    precision
	-------  -----------  --------  -----------
	False          0.014     1            0.988
	True           0.382     0.848        0.16
Halfak moved this task from Parked to Completed on the Machine-Learning-Team (Active Tasks) board.
Halfak moved this task from Completed to Review on the Machine-Learning-Team (Active Tasks) board.
Halfak added a subscriber: Halfak.

Looks like there was a substantial improvement to pr-auc