Page MenuHomePhabricator
Paste P8042

enwiki.draft_quality GradientBoosting model (with words_to_watch)
ActivePublic

Authored by hoo on Jan 26 2019, 8:40 PM.
hoo@stat1007:~$ cat draftquality/model_info/enwiki.draft_quality.md
Model Information:
- type: GradientBoosting
- version: 0.2.0
- params: {'max_depth': 5, 'min_impurity_decrease': 0.0, 'multilabel': False, 'min_weight_fraction_leaf': 0.0, 'scale': False, 'label_weights': None, 'random_state': None, 'subsample': 1.0, 'min_impurity_split': None, 'max_features': 'log2', 'n_estimators': 300, 'labels': ['OK', 'spam', 'vandalism', 'attack'], 'presort': 'auto', 'max_leaf_nodes': None, 'learning_rate': 0.1, 'center': False, 'verbose': 0, 'population_rates': None, 'init': None, 'min_samples_leaf': 1, 'criterion': 'friedman_mse', 'warm_start': False, 'loss': 'deviance', 'min_samples_split': 2}
Environment:
- revscoring_version: '2.3.0'
- platform: 'Linux-4.9.0-8-amd64-x86_64-with-debian-9.5'
- machine: 'x86_64'
- version: '#1 SMP Debian 4.9.110-3+deb9u6 (2018-10-08)'
- system: 'Linux'
- processor: ''
- python_build: ('default', 'Sep 27 2018 17:25:39')
- python_compiler: 'GCC 6.3.0 20170516'
- python_branch: ''
- python_implementation: 'CPython'
- python_revision: ''
- python_version: '3.5.3'
- release: '4.9.0-8-amd64'
Statistics:
counts (n=201261):
label n ~OK ~spam ~vandalism ~attack
----------- ------ --- ------ ------- ------------ ---------
'OK' 175000 --> 171425 2656 865 54
'spam' 17699 --> 2763 14037 865 34
'vandalism' 6503 --> 1596 1351 3213 343
'attack' 2059 --> 252 350 1117 340
rates:
'OK' 'spam' 'vandalism' 'attack'
---------- ------ -------- ------------- ----------
sample 0.87 0.088 0.032 0.01
population 0.971 0.02 0.007 0.002
match_rate (micro=0.93, macro=0.254):
OK vandalism spam attack
----- ----------- ------ --------
0.956 0.018 0.039 0.003
filter_rate (micro=0.07, macro=0.746):
OK vandalism spam attack
----- ----------- ------ --------
0.044 0.982 0.961 0.997
recall (micro=0.971, macro=0.608):
OK vandalism spam attack
---- ----------- ------ --------
0.98 0.494 0.793 0.165
!recall (micro=0.829, macro=0.946):
OK vandalism spam attack
----- ----------- ------ --------
0.824 0.985 0.976 0.998
precision (micro=0.975, macro=0.434):
OK vandalism spam attack
----- ----------- ------ --------
0.995 0.196 0.399 0.148
!precision (micro=0.559, macro=0.884):
OK vandalism spam attack
----- ----------- ------ --------
0.546 0.996 0.996 0.998
f1 (micro=0.971, macro=0.489):
OK vandalism spam attack
----- ----------- ------ --------
0.987 0.281 0.531 0.156
!f1 (micro=0.667, macro=0.908):
OK vandalism spam attack
----- ----------- ------ --------
0.657 0.991 0.986 0.998
accuracy (micro=0.975, macro=0.981):
OK vandalism spam attack
----- ----------- ------ --------
0.975 0.982 0.973 0.996
fpr (micro=0.171, macro=0.054):
OK vandalism spam attack
----- ----------- ------ --------
0.176 0.015 0.024 0.002
roc_auc (micro=0.979, macro=0.971):
OK vandalism spam attack
----- ----------- ------ --------
0.979 0.956 0.979 0.968
pr_auc (micro=0.984, macro=0.479):
OK vandalism spam attack
----- ----------- ------ --------
0.999 0.207 0.612 0.097
- score_schema: {'title': 'Scikit learn-based classifier score with probability', 'type': 'object', 'properties': {'probability': {'type': 'object', 'description': 'A mapping of probabilities onto each of the potential output labels', 'properties': {'OK': {'type': 'number'}, 'vandalism': {'type': 'number'}, 'spam': {'type': 'number'}, 'attack': {'type': 'number'}}}, 'prediction': {'type': 'string', 'description': 'The most likely label predicted by the estimator'}}}

Event Timeline