$ nice make models cat datasets/enwiki.labeled_revisions.w_cache.20k_2015.json | \ revscoring cv_train \ revscoring.scoring.models.GradientBoosting \ editquality.feature_lists.enwiki.damaging \ damaging \ --version=0.4.0 \ -p 'learning_rate=0.01' \ -p 'max_depth=7' \ -p 'max_features="log2"' \ -p 'n_estimators=700' \ --label-weight "true=10" \ --pop-rate "true=0.034163555464634586" \ --pop-rate "false=0.9658364445353654" \ --center --scale > models/enwiki.damaging.gradient_boosting.model Traceback (most recent call last): File "/srv/home/halfak/venv/3.5/bin/revscoring", line 11, in <module> sys.exit(main()) File "/srv/home/halfak/venv/3.5/lib/python3.5/site-packages/revscoring/revscoring.py", line 51, in main module.main(sys.argv[2:]) File "/srv/home/halfak/venv/3.5/lib/python3.5/site-packages/revscoring/utilities/cv_train.py", line 119, in main for ob in observations] File "/srv/home/halfak/venv/3.5/lib/python3.5/site-packages/revscoring/utilities/cv_train.py", line 119, in <listcomp> for ob in observations] KeyError: 'damaging' Makefile:856: recipe for target 'models/enwiki.damaging.gradient_boosting.model' failed make: *** [models/enwiki.damaging.gradient_boosting.model] Error 1 make: *** Deleting file 'models/enwiki.damaging.gradient_boosting.model' /srv/home/halfak/venv/3.5/lib/python3.5/site-packages $ cat datasets/enwiki.labeled_revisions.w_cache.20k_2015.json | json2tsv damaging | sort | uniq -c 18693 False 104 null 751 True
Description
Description
Related Objects
Related Objects
- Mentioned In
- rOEQ4633429bd836: Fix merge_labels handling of degenerate case (#154) * Fix merge_labels handling…
rOEQa8f532d6b118: Fix merge_labels handling of degenerate case (#154) * Fix merge_labels handling…
rOEQb4c4b69a1911: Fix merge_labels handling of degenerate case (#154)
Blog Post: Status Update (May 2, 2018)
rOEQ22da706677fe: Fix merge_labels handling of degenerate case (#154)
rOEQ3adcf849a5c5: Fix merge_labels handling of degenerate case
rOEQ2ee0711c337c: Fix merge_labels handling of degenerate case
Event Timeline
Comment Actions
Confirmed broken. This line in human_labeled,
datasets/enwiki.human_labeled_revisions.20k_2015.json:{"auto_labeled": false, "autolabel": {}, "rev_id": 652836891}
is allowed through to labeled_revisions by the merge_labels utility.
I'll fix and write a test.
Comment Actions
It's an edge case that can only happen when no autolabeled file is given, and we're only passing human labeled data to merge_labels. Maybe we want to stop this usage and write a separate tool?
Comment Actions
Fix is included in the revscoring 2.2.2 work: https://github.com/wiki-ai/editquality/commit/2ee0711c337cbe91d77b2a4f7cf0778d9b256f19
Comment Actions
Split this patch out so we can merge it ahead of the 2.2.2 update:
https://github.com/wiki-ai/editquality/pull/154