Page MenuHomePhabricator

Train a `reverted` model for jawiki
Closed, ResolvedPublic

Event Timeline

Halfak removed Halfak as the assignee of this task.Apr 5 2016, 6:05 PM
Halfak moved this task from Backlog to Parked on the Machine-Learning-Team (Active Tasks) board.

Strangely LogisticRegression came up the best (and with 1% difference GradientBoosting). since we don't have LR in revscoring yet, I made the GB model:

models/jawiki.reverted.gradient_boosting.model
2016-04-10 06:32:47,931 INFO:revscoring.utilities.train_test -- Training model...
2016-04-10 06:32:51,275 INFO:revscoring.utilities.train_test -- Testing model...
ScikitLearnClassifier
 - type: GradientBoosting
 - params: min_weight_fraction_leaf=0.0, presort="auto", n_estimators=700, center=true, warm_start=false, scale=true, subsample=1.0, min_samples_split=2, loss="deviance", balanced_sample_weight=true, verbose=0, min_samples_leaf=1, learning_rate=0.1, balanced_sample=false, max_depth=1, init=null, random_state=null, max_leaf_nodes=null, max_features="log2"
 - version: 0.0.1
 - trained: 2016-04-10T06:32:51.272178

Table:
                 ~False    ~True
        -----  --------  -------
        False      2894     1013
        True         17       50

Accuracy: 0.741
Precision: 0.047
Recall: 0.746
PR-AUC: 0.14
ROC-AUC: 0.782
Recall @ 0.1 false-positive rate: threshold=0.985, recall=0.015, fpr=0.0
Filter rate @ 0.9 recall: threshold=0.194, filter_rate=0.28, recall=0.91
Filter rate @ 0.75 recall: threshold=0.49, filter_rate=0.722, recall=0.761

AUC is not bad but not very good either.

I think the reason is that we don't have dictionary for ja

Halfak added a subscriber: Johan.

Moving this to the backlog. Still looking for a Japanese speaker to help us review our dataset for training/testing. @Johan, maybe you could help us with this during the workshop at Wikimania? See T134628

Halfak triaged this task as Low priority.Jul 5 2016, 2:30 PM
Ladsgroup removed a project: User-Ladsgroup.
Ladsgroup subscribed.
Halfak claimed this task.

Looks like this eventually got deployed. I can see it up on ORES.