Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Ladsgroup | T151611 Enable ORES Review Tool on Czech Wikipedia | |||
Resolved | Ladsgroup | T156492 Train and test damaging/goodfaith models for Czech Wikipedia |
Event Timeline
Comment Actions
# Model tuning report - Revscoring version: 1.3.5 - Features: editquality.feature_lists.cswiki.damaging - Date: 2017-01-28T17:42:35.310686 - Observations: 4925 - Labels: [false, true] - Scoring: roc_auc - Folds: 5 # Top scoring configurations | model | mean(scores) | std(scores) | params | |:---------------------------|---------------:|--------------:|:-------------------------------------------------------------------------------| | RandomForestClassifier | 0.834 | 0.016 | max_features="log2", criterion="entropy", n_estimators=640, min_samples_leaf=3 | | RandomForestClassifier | 0.834 | 0.017 | max_features="log2", criterion="entropy", n_estimators=160, min_samples_leaf=5 | | RandomForestClassifier | 0.833 | 0.016 | max_features="log2", criterion="entropy", n_estimators=640, min_samples_leaf=5 | | GradientBoostingClassifier | 0.831 | 0.021 | max_features="log2", learning_rate=0.01, n_estimators=500, max_depth=7 | | RandomForestClassifier | 0.831 | 0.017 | max_features="log2", criterion="entropy", n_estimators=320, min_samples_leaf=3 | | GradientBoostingClassifier | 0.831 | 0.021 | max_features="log2", learning_rate=0.01, n_estimators=700, max_depth=7 | | RandomForestClassifier | 0.83 | 0.02 | max_features="log2", criterion="entropy", n_estimators=640, min_samples_leaf=1 | | RandomForestClassifier | 0.83 | 0.012 | max_features="log2", criterion="entropy", n_estimators=160, min_samples_leaf=3 | | RandomForestClassifier | 0.83 | 0.015 | max_features="log2", criterion="entropy", n_estimators=640, min_samples_leaf=7 | | GradientBoostingClassifier | 0.83 | 0.019 | max_features="log2", learning_rate=0.01, n_estimators=500, max_depth=5 |
Comment Actions
Model for damaging:
- type: GradientBoosting - params: balanced_sample_weight=true, init=null, min_samples_split=2, center=true, subsample=1.0, min_weight_fraction_leaf=0.0, balanced_sample=false, n_estimators=500, random_state=null, presort="auto", warm_start=false, scale=true, verbose=0, max_depth=7, learning_rate=0.01, min_samples_leaf=1, max_features="log2", loss="deviance", max_leaf_nodes=null - version: 0.3.0 - trained: 2017-01-28T18:08:59.959144 Table: ~False ~True ----- -------- ------- False 4032 444 True 208 241 Accuracy: 0.868 Precision: ----- ----- False 0.951 True 0.348 ----- ----- Recall: ----- ----- False 0.901 True 0.534 ----- ----- PR-AUC: ----- ----- False 0.976 True 0.423 ----- ----- ROC-AUC: ----- ----- False 0.835 True 0.835 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.856 0.534 0.088 True 0.508 0.547 0.091 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.837 0.56 0.982 True 0.885 0.059 1 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.118 0.997 0.914 True 0.881 0.067 0.991 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.095 1 0.91 True 0.64 0.409 0.474 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.095 1 0.91 True 0.144 0.903 0.161
Goodfaith:
- type: GradientBoosting - params: presort="auto", min_samples_leaf=1, max_leaf_nodes=null, min_samples_split=2, center=true, max_depth=5, balanced_sample_weight=true, min_weight_fraction_leaf=0.0, verbose=0, balanced_sample=false, random_state=null, max_features="log2", subsample=1.0, n_estimators=500, scale=true, warm_start=false, loss="deviance", learning_rate=0.01, init=null - version: 0.3.0 - trained: 2017-01-28T18:10:19.008889 Table: ~False ~True ----- -------- ------- False 159 59 True 522 4185 Accuracy: 0.882 Precision: ----- ----- False 0.23 True 0.986 ----- ----- Recall: ----- ----- False 0.722 True 0.889 ----- ----- PR-AUC: ----- ----- False 0.459 True 0.991 ----- ----- ROC-AUC: ----- ----- False 0.888 True 0.888 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.568 0.708 0.087 True 0.847 0.576 0.075 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.914 0.111 1 True 0.337 0.951 0.982 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.914 0.111 1 True 0.079 1 0.957 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.773 0.465 0.518 True 0.079 1 0.957 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.267 0.836 0.17 True 0.079 1 0.957
Comment Actions
ScikitLearnClassifier - type: GradientBoosting - params: warm_start=false, min_samples_split=2, loss="deviance", init=null, min_weight_fraction_leaf=0.0, min_samples_leaf=1, max_depth=7, presort="auto", max_leaf_nodes=null, verbose=0, balanced_sample_weight=true, random_state=null, balanced_sample=false, max_features="log2", n_estimators=500, scale=true, center=true, learning_rate=0.01, subsample=1.0 - version: 0.3.0 - trained: 2017-01-28T20:18:51.260027 Table: ~False ~True ----- -------- ------- False 17812 1193 True 96 741 Accuracy: 0.935 Precision: ----- ----- False 0.995 True 0.383 ----- ----- Recall: ----- ----- False 0.937 True 0.884 ----- ----- PR-AUC: ----- ----- False 0.995 True 0.802 ----- ----- ROC-AUC: ----- ----- False 0.969 True 0.965 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.572 0.916 0.093 True 0.369 0.918 0.094 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.122 0.997 0.981 True 0.919 0.345 0.996 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.051 1 0.963 True 0.88 0.562 0.912 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.051 1 0.963 True 0.61 0.863 0.474 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.051 1 0.963 True 0.16 0.955 0.175
Comment Actions
ScikitLearnClassifier - type: GradientBoosting - params: center=true, scale=true, min_weight_fraction_leaf=0.0, n_estimators=500, presort="auto", max_leaf_nodes=null, warm_start=false, verbose=0, max_features="log2", balanced_sample=false, init=null, loss="deviance", balanced_sample_weight=true, learning_rate=0.01, min_samples_leaf=1, min_samples_split=2, subsample=1.0, random_state=null, max_depth=5 - version: 0.3.0 - trained: 2017-01-28T22:26:31.693530 Table: ~False ~True ----- -------- ------- False 368 31 True 1079 18364 Accuracy: 0.944 Precision: ----- ----- False 0.254 True 0.998 ----- ----- Recall: ----- ----- False 0.922 True 0.945 ----- ----- PR-AUC: ----- ----- False 0.692 True 0.995 ----- ----- ROC-AUC: ----- ----- False 0.969 True 0.971 ----- ----- Recall @ 0.1 false-positive rate: label threshold recall fpr ------- ----------- -------- ----- False 0.31 0.951 0.081 True 0.368 0.954 0.088 Recall @ 0.98 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.952 0.21 1 True 0.046 1 0.983 Recall @ 0.9 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.94 0.334 0.936 True 0.046 1 0.983 Recall @ 0.45 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.839 0.779 0.476 True 0.046 1 0.983 Recall @ 0.15 precision: label threshold recall precision ------- ----------- -------- ----------- False 0.306 0.945 0.197 True 0.046 1 0.983
Comment Actions
@Halfak If this is done (according to the column) I think we should close it as resolved, shouldn't we?
Comment Actions
This is now deployed in WMFLabs. Next step is production and then enabling the ORES Review Tool. Stay tuned.
https://ores.wmflabs.org/v2/scores/cswiki/?models=damaging|goodfaith&model_info=trained|type