Page MenuHomePhabricator
Paste P8033

enwiki.draft_quality RandomForest model (with words_to_watch)
ActivePublic

Authored by hoo on Jan 24 2019, 10:58 PM.
hoo@stat1007:~$ cat draftquality/model_info/enwiki.draft_quality.md
Model Information:
- type: RandomForest
- version: 0.2.0
- params: {'warm_start': False, 'min_samples_split': 2, 'label_weights': None, 'verbose': 0, 'max_depth': None, 'multilabel': False, 'min_samples_leaf': 3, 'n_jobs': 1, 'min_impurity_split': None, 'max_features': 'log2', 'max_leaf_nodes': None, 'bootstrap': True, 'center': False, 'class_weight': None, 'min_impurity_decrease': 0.0, 'population_rates': None, 'random_state': None, 'oob_score': False, 'criterion': 'entropy', 'labels': ['OK', 'spam', 'vandalism', 'attack'], 'scale': False, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 640}
Environment:
- revscoring_version: '2.3.0'
- platform: 'Linux-4.9.0-8-amd64-x86_64-with-debian-9.5'
- machine: 'x86_64'
- version: '#1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08)'
- system: 'Linux'
- processor: ''
- python_build: ('default', 'Sep 27 2018 17:25:39')
- python_compiler: 'GCC 6.3.0 20170516'
- python_branch: ''
- python_implementation: 'CPython'
- python_revision: ''
- python_version: '3.5.3'
- release: '4.9.0-8-amd64'
Statistics:
counts (n=201261):
label n ~OK ~spam ~vandalism ~attack
----------- ------ --- ------ ------- ------------ ---------
'OK' 175000 --> 171657 2607 726 10
'spam' 17699 --> 3239 13730 724 6
'vandalism' 6503 --> 1656 1605 3131 111
'attack' 2059 --> 298 457 1148 156
rates:
'OK' 'spam' 'vandalism' 'attack'
---------- ------ -------- ------------- ----------
sample 0.87 0.088 0.032 0.01
population 0.971 0.02 0.007 0.002
match_rate (micro=0.931, macro=0.254):
OK spam attack vandalism
----- ------ -------- -----------
0.958 0.04 0.001 0.017
filter_rate (micro=0.069, macro=0.746):
OK spam attack vandalism
----- ------ -------- -----------
0.042 0.96 0.999 0.983
recall (micro=0.971, macro=0.578):
OK spam attack vandalism
----- ------ -------- -----------
0.981 0.776 0.076 0.481
!recall (micro=0.807, macro=0.941):
OK spam attack vandalism
----- ------ -------- -----------
0.802 0.975 0.999 0.987
precision (micro=0.975, macro=0.448):
OK spam attack vandalism
----- ------ -------- -----------
0.994 0.378 0.213 0.207
!precision (micro=0.569, macro=0.886):
OK spam attack vandalism
----- ------ -------- -----------
0.556 0.995 0.998 0.996
f1 (micro=0.971, macro=0.474):
OK spam attack vandalism
----- ------ -------- -----------
0.987 0.508 0.112 0.289
!f1 (micro=0.666, macro=0.908):
OK spam attack vandalism
----- ------ -------- -----------
0.657 0.985 0.999 0.991
accuracy (micro=0.976, macro=0.982):
OK spam attack vandalism
----- ------ -------- -----------
0.976 0.971 0.997 0.983
fpr (micro=0.193, macro=0.059):
OK spam attack vandalism
----- ------ -------- -----------
0.198 0.025 0.001 0.013
roc_auc (micro=0.983, macro=0.973):
OK spam attack vandalism
----- ------ -------- -----------
0.983 0.979 0.968 0.961
pr_auc (micro=0.984, macro=0.487):
OK spam attack vandalism
----- ------ -------- -----------
0.999 0.609 0.099 0.24
- score_schema: {'properties': {'prediction': {'description': 'The most likely label predicted by the estimator', 'type': 'string'}, 'probability': {'properties': {'OK': {'type': 'number'}, 'spam': {'type': 'number'}, 'attack': {'type': 'number'}, 'vandalism': {'type': 'number'}}, 'description': 'A mapping of probabilities onto each of the potential output labels', 'type': 'object'}}, 'title': 'Scikit learn-based classifier score with probability', 'type': 'object'}

Event Timeline