Train a basic item quality based on edit quality for Wikidata
Closed, ResolvedPublic

Ladsgroup created this task.May 9 2017, 6:56 PM
Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptMay 9 2017, 6:56 PM

@Ladsgroup : are you going to train with a single classifier? or are you going to train with multiple classifiers and measure the result to find which of the classifiers which has the best accuracy?

@Glorian_WD, we use revscoring tune to do estimator and hyperparameter optimization. So, we'll likely test out a set of benchmark models (naive bayes, logistic regression, etc.) as well as a large set of parameters for Random Forest and Gradient Boosting.

@Halfak: Oh I see. Does Multi-layer Perceptron also in the benchmark models?

2017-05-20 17:01:35,365 INFO:revscoring.utilities.cv_train -- Cross-validating model statistics for 10 folds...
2017-05-20 17:01:35,428 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 1...
2017-05-20 17:01:35,451 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 2...
2017-05-20 17:01:35,462 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 3...
2017-05-20 17:01:35,483 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 5...
2017-05-20 17:01:35,508 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 4...
2017-05-20 17:01:35,517 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 6...
2017-05-20 17:01:35,506 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 7...
2017-05-20 17:01:35,528 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 8...
2017-05-20 17:01:42,878 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 9...
2017-05-20 17:01:42,925 INFO:revscoring.scorer_models.sklearn_classifier -- Performing cross-validation 10...
2017-05-20 17:01:48,206 INFO:revscoring.utilities.cv_train -- Training model on all data...
ScikitLearnClassifier
 - type: RF
 - params: verbose=0, scale=true, criterion="gini", balanced_sample_weight=false, min_samples_split=2, max_features="log2", n_jobs=1, min_weight_fraction_leaf=0.0, warm_start=false, center=true, balanced_sample=true, oob_score=false, class_weight=null, random_state=null, bootstrap=true, max_leaf_nodes=null, n_estimators=20, max_depth=null, min_samples_leaf=13
 - version: .0
 - trained: 2017-05-20T17:01:48.950545

Table:
	      ~A    ~B    ~C    ~D    ~E
	--  ----  ----  ----  ----  ----
	A    279    33    10     0     0
	B     64   291    77     5     1
	C     63   208  1414    86     2
	D      0     1    65   894    37
	E      0     0     5   103  1361

Accuracy: 0.848
ROC-AUC:
	---  -----
	'A'  0.987
	'B'  0.937
	'C'  0.969
	'D'  0.977
	'E'  0.993
	---  -----

F1:
	-  -----
	E  0.948
	B  0.595
	D  0.858
	A  0.764
	C  0.845
	-  -----
Ladsgroup moved this task from Active to Review on the Scoring-platform-team (Current) board.
Halfak closed this task as Resolved.Jun 5 2017, 5:07 PM