Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | awight | T201518 ORES deployment (Early August) | |||
Resolved | Ladsgroup | T190050 Train and test wp10 model for fawiki | |||
Resolved | Halfak | T174684 Article quality campaign for Persian Wikipedia |
Event Timeline
Model Information: - type: GradientBoosting - version: 0.6.0 - params: {'presort': 'auto', 'population_rates': None, 'learning_rate': 0.01, 'min_weight_fraction_leaf': 0.0, 'loss': 'deviance', 'max_depth': 7, 'labels': ['Stub', 'Start', 'C', 'B', 'GA', 'FA'], 'verbose': 0, 'init': None, 'min_samples_leaf': 1, 'max_leaf_nodes': None, 'center': True, 'scale': True, 'random_state': None, 'multilabel': False, 'n_estimators': 700, 'label_weights': None, 'subsample': 1.0, 'max_features': 'log2', 'warm_start': False, 'min_samples_split': 2} Environment: - revscoring_version: '2.1.0' - platform: 'Linux-4.9.0-6-amd64-x86_64-with-debian-9.4' - machine: 'x86_64' - version: '#1 SMP Debian 4.9.82-1+deb9u3 (2018-03-02)' - system: 'Linux' - processor: '' - python_build: ('default', 'Jan 19 2017 14:11:04') - python_compiler: 'GCC 6.3.0 20170118' - python_branch: '' - python_implementation: 'CPython' - python_revision: '' - python_version: '3.5.3' - release: '4.9.0-6-amd64' Statistics: counts (n=665): label n ~Stub ~Start ~C ~B ~GA ~FA ------- --- --- ------- -------- ---- ---- ----- ----- 'Stub' 24 --> 14 3 0 0 6 1 'Start' 22 --> 4 2 3 6 7 0 'C' 26 --> 1 1 2 1 18 3 'B' 66 --> 1 0 1 9 41 14 'GA' 273 --> 0 0 1 10 189 73 'FA' 254 --> 1 1 0 6 80 166 rates: 'Stub' 'Start' 'C' 'B' 'GA' 'FA' ---------- -------- --------- ----- ----- ------ ------ sample 0.036 0.033 0.039 0.099 0.411 0.382 population 0.036 0.033 0.039 0.099 0.411 0.382 match_rate (micro=0.365, macro=0.167): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.032 0.048 0.386 0.513 0.011 0.011 filter_rate (micro=0.635, macro=0.833): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.968 0.952 0.614 0.487 0.989 0.989 recall (micro=0.574, macro=0.372): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.583 0.136 0.654 0.692 0.077 0.091 !recall (micro=0.751, macro=0.888): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.989 0.962 0.779 0.612 0.992 0.992 precision (micro=0.547, macro=0.453): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.667 0.281 0.646 0.554 0.286 0.286 !precision (micro=0.799, macro=0.892): Stub B FA GA C Start ------ ---- ----- ----- ----- ------- 0.984 0.91 0.784 0.741 0.964 0.97 f1 (micro=0.551, macro=0.388): Stub B FA GA C Start ------ ----- ---- ----- ----- ------- 0.622 0.184 0.65 0.616 0.121 0.138 !f1 (micro=0.773, macro=0.889): Stub B FA GA C Start ------ ----- ----- ---- ----- ------- 0.987 0.935 0.781 0.67 0.978 0.981 accuracy (micro=0.736, macro=0.858): Stub B FA GA C Start ------ ---- ----- ----- ----- ------- 0.974 0.88 0.731 0.645 0.956 0.962 fpr (micro=0.249, macro=0.112): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.011 0.038 0.221 0.388 0.008 0.008 roc_auc (micro=0.735, macro=0.806): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.987 0.663 0.749 0.694 0.811 0.934 pr_auc (micro=0.537, macro=0.414): Stub B FA GA C Start ------ ----- ----- ----- ----- ------- 0.674 0.205 0.647 0.563 0.147 0.249 - score_schema: {'type': 'object', 'title': 'Scikit learn-based classifier score with probability', 'properties': {'probability': {'type': 'object', 'description': 'A mapping of probabilities onto each of the potential output labels', 'properties': {'Stub': 'number', 'B': 'number', 'FA': 'number', 'GA': 'number', 'C': 'number', 'Start': 'number'}}, 'prediction': {'type': 'string', 'description': 'The most likely label predicted by the estimator'}}}
This is somehow missing a ton of observations for the lower classes. Those should come out of one of the labeling campaigns.
So, one of the labeling campaigns is probably not getting pulled in. Want to check on that?
I checked on this with @Ladsgroup and it looks like we accidentally had people label the GA/FA set rather than the 600 observation sample. So I've loaded the 600 observation sample (see http://labels.wmflabs.org/ui/fawiki/). On the bright side, we'll have better data about the 300 GA/FA sample (not all were labeled GA/FA).
Looks like this is still blocked on the new labeling campaign (at 18%), moving out of the review column.
I added the new campaign to the PR: https://github.com/wiki-ai/articlequality/pull/63/commits/32337bd97ffa66a8e2876d214d73670c370b111f Please take a look