Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Catrope | T192496 Deploy ORES advanced editquality models to huwiki | |||
Resolved | Tgr | T185903 Train/test damaging and goodfaith model for Hungarian Wikipedia | |||
Resolved | Halfak | T167968 Complete edit quality campaign for Hungarian Wikipedia |
Event Timeline
Does one need any special permissions for this? I'd be interested in taking it on, just to get a feeling of how things work.
Go ahead :) Just remember you need to make the models in a node that is similar to prod environment (like stat1005, that's we usually make models)
ores-misc-01.eqiad.wmflabs is another option. If you tell me your wikitech account, I can get you set up.
@Tgr Cool! Here's another reference (to help us fix :) https://www.mediawiki.org/wiki/ORES/New_model_checklist
Hey! Checking in here. How's progress? If you're stalled, I'd be happy to take over and have you train the next model. :)
Uh, sorry, I haven't had much free time in the last couple days. I'll try to wrap it up this week and unlick the cookie if that doesn't work out. (Feel free to unassign me if you want it done sooner than that.)
Here's my progress so far.
- Updated the wordlist page from the talk page (it's unclear what some of the template fields mean so ignored those)
- cloned editquality and revscoring on stat1005 (need to use the proxy and HTTPS Github URLs), ran pip install -r requirements.txt for both (in hindsight pip install -e might have been simpler)
- set up a venv, learned that the right way to do that on stat1005 is the virtualenv command (as opposed to python3 -m venv which will complain about missing packages and abort)
- updated editquality makefile along these lines. The patch is not super helpful in figuring out what to do:
- it does not use $@/$</$^, which seems to be the standard today
- it uses different parameters for revscoring calls (that seems to have changed with T173202)
- there are a bunch of random-seeming numbers (pop-rate, again coming from T173202) which are different for every wiki
- it's from before the makefile was split into a manual and a templated part (although huwiki is not templated so that wasn't too confusing).
I omitted the pop-rate parameters, otherwise tried to ape the current frwiki params, and ran make huwiki_tuning_reports (on the assumption that this will tell me what to use for the seemingly-random parameters). That took about an hour to run, threw a bunch of non-fatal-looking errors (No revision was found for parameter "rvstartid", KeyError: 'user', TypeError: 'NoneType' object is not iterable - I assume these are revdelete-related; RuntimeWarning: overflow encountered in double_scalars, RuntimeWarning: invalid value encountered in multiply and a couple more like that; ValueError: cannot convert float NaN to integer, TypeError: only integer arrays with one element can be converted to an index), outputted something that looked halfway between a core dump and a roguelike map, and stopped due to user error (frwiki had different sample sizes for damaging, I didn't pay attention and just did a string search/replace on that wikiname and so the build step names did not match - annoyingly there seems to be no way to validate a makefile, even just to the extent of making sure all the dependencies exist). Full output is in P6714.
After fixing, I re-ran and got a zillion of errors like 2018-02-17 19:25:20,611 ERROR:editquality.utilities.merge_labels -- 17226861 has no labels, but was flagged for review and Could not cross-validate estimator RandomForest ... ModelConsistencyError: Labels {False} not in expected labels {True}. Other than that it seemed successful and created the tuning reports, but they are empty. Output: P6715 (a bit is missing from the beginning, was too long for scrollback).
Next I tried to build the models (make models/huwiki.damaging.gradient_boosting.model and make models/huwiki.goodfaith.gradient_boosting.model), that failed with __init__() got an unexpected keyword argument 'label_weights' and __init__() got an unexpected keyword argument 'population_rates' respectively. I tried to remove the label-weight lines from the makefile; that resulted in RuntimeError: Either --pop-rates or --labels or --labels-config must be specified.
The makefile change is up for inspection here.
Some comments
Updating the page is not enough. You need to make a PR against revscoring and then get it merged and release it in pypi and then bump the version number in editquality requirements.txt
- updated editquality makefile along these lines. The patch is not super helpful in figuring out what to do:
- it does not use $@/$</$^, which seems to be the standard today
It has changed. Please take a look at the new Makefile
- it uses different parameters for revscoring calls (that seems to have changed with T173202)
Overall, I highly recommend trying to template huwiki (T168455: [Epic] Implement code generation for model makefile maintenance) so you just make a yaml file, run a script and bam! The Makefile has changed.
- there are a bunch of random-seeming numbers (pop-rate, again coming from T173202) which are different for every wiki
--pop-rate is percentage of true cases in the dataset of that model, grep and count the lines. I love to make it part of the templating script as it's already very confusing and prone to errors but hadn't have time to get around it.
- it's from before the makefile was split into a manual and a templated part (although huwiki is not templated so that wasn't too confusing).
if you have some time, try to start with templating the huwiki without adding damaging and goodfaith (I think it's not very edge-casey wiki) and then adding damaging and goodfaith models will be rather easy.
Thanks, templating does indeed make it a lot easier!
Tracked in T187753: Update Hungarian language assets.
--pop-rate is percentage of true cases in the dataset of that model, grep and count the lines.
How come it's such a long fraction for most wikis? With 5k datasets it shouldn't have more than 5 decimal places.
Second attempt:
- updated config/manual_wikis.yaml, removed .tmp from the huwiki config file name
- ran python utility generate_make --output=Makefile
- compared with diff --color <(awk '/^###+/{flag=0}/^###+ Hungarian/{flag=1}flag' Makefile.manual) <(awk '/^###+/{flag=0}/^###+ Hungarian/{flag=1}flag' Makefile.manual) (the --compare-section parameter for generate_make did not seem to do anything useful). Result is in P6716; looks like nothing got lost.
- counted true cases with python utility fetch_labels http://labels.wmflabs.org/campaigns/huwiki/33/ | grep '"goodfaith": true' | wc -l etc
- ran make huwiki_tuning_reports (ran fine this time, had far less errors too: P6717)
- updated the config per tuning recommendations
- tried to train the models but all three aborted with json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) (P6718)
Next step on the checklist is Test, cross-validate, review model health but it's not documented.
https://github.com/wiki-ai/revscoring/pull/395 I think we'll need this to properly handle getting human-labeled data.
I've made merge_labels smarter, it should do the correct intersection between autolabeled and human_labeled. See my rebase and attempt to continue the work, https://github.com/wiki-ai/editquality/compare/awight-huwiki
This is currently running on ores-misc-01, I'll report back here with results.
@Tgr Hi! We finally have a way forward, but I wanted to mention that we're a bit blocked by the PR's branch living in your forked repo. If you have time to rebase, that would be great, otherwise I'm happy to take over and push a new branch.
Tweak to merge human and auto labels, https://github.com/wiki-ai/editquality/pull/153
@Bencemac Thanks for the nudge, we've updated this task to reflect that the code changes are merged and will be deployed soon.
Mentioned in SAL (#wikimedia-cloud) [2018-04-16T18:58:27Z] <awight> Update ORES editquality; T185903