Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Halfak | T130869 Train a `reverted` model for jawiki | |||
Open | None | T133405 [research] Why is the japanese 'reverted' model so bad? |
Event Timeline
Comment Actions
Strangely LogisticRegression came up the best (and with 1% difference GradientBoosting). since we don't have LR in revscoring yet, I made the GB model:
models/jawiki.reverted.gradient_boosting.model 2016-04-10 06:32:47,931 INFO:revscoring.utilities.train_test -- Training model... 2016-04-10 06:32:51,275 INFO:revscoring.utilities.train_test -- Testing model... ScikitLearnClassifier - type: GradientBoosting - params: min_weight_fraction_leaf=0.0, presort="auto", n_estimators=700, center=true, warm_start=false, scale=true, subsample=1.0, min_samples_split=2, loss="deviance", balanced_sample_weight=true, verbose=0, min_samples_leaf=1, learning_rate=0.1, balanced_sample=false, max_depth=1, init=null, random_state=null, max_leaf_nodes=null, max_features="log2" - version: 0.0.1 - trained: 2016-04-10T06:32:51.272178 Table: ~False ~True ----- -------- ------- False 2894 1013 True 17 50 Accuracy: 0.741 Precision: 0.047 Recall: 0.746 PR-AUC: 0.14 ROC-AUC: 0.782 Recall @ 0.1 false-positive rate: threshold=0.985, recall=0.015, fpr=0.0 Filter rate @ 0.9 recall: threshold=0.194, filter_rate=0.28, recall=0.91 Filter rate @ 0.75 recall: threshold=0.49, filter_rate=0.722, recall=0.761
AUC is not bad but not very good either.