⚓ T185903 Train/test damaging and goodfaith model for Hungarian Wikipedia

Status	Assigned	Task
Resolved	Catrope	T192496 Deploy ORES advanced editquality models to huwiki
Resolved	Tgr	T185903 Train/test damaging and goodfaith model for Hungarian Wikipedia
Resolved	Halfak	T167968 Complete edit quality campaign for Hungarian Wikipedia

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptJan 29 2018, 4:41 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Halfak added a subtask: T167968: Complete edit quality campaign for Hungarian Wikipedia.Jan 29 2018, 4:41 PM

Ladsgroup claimed this task.Jan 29 2018, 4:47 PM

Ladsgroup edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.

Restricted Application added a project: User-Ladsgroup. · View Herald TranscriptJan 29 2018, 4:47 PM

Tacsipacsi subscribed.Jan 29 2018, 6:56 PM

Halfak closed subtask T167968: Complete edit quality campaign for Hungarian Wikipedia as Resolved.Jan 30 2018, 8:32 PM

Does one need any special permissions for this? I'd be interested in taking it on, just to get a feeling of how things work.

Go ahead :) Just remember you need to make the models in a node that is similar to prod environment (like stat1005, that's we usually make models)

ores-misc-01.eqiad.wmflabs is another option. If you tell me your wikitech account, I can get you set up.

@Tgr Cool! Here's another reference (to help us fix :) https://www.mediawiki.org/wiki/ORES/New_model_checklist

In T185903#3932473, @Halfak wrote:

ores-misc-01.eqiad.wmflabs is another option. If you tell me your wikitech account, I can get you set up.

Gergő Tisza. I have stat1005 access though, I imagine that one's more speedy.

Yup! Definitely a bit faster. Godspeed :)

Tgr claimed this task.Jan 30 2018, 11:45 PM

Tgr added a project: User-Tgr.

Tgr moved this task from Backlog to Huwiki on the User-Tgr board.

Samat subscribed.Feb 2 2018, 9:42 PM

Samat awarded a token.Feb 2 2018, 9:47 PM

Balint36 subscribed.Feb 2 2018, 10:20 PM

Hey! Checking in here. How's progress? If you're stalled, I'd be happy to take over and have you train the next model. :)

Uh, sorry, I haven't had much free time in the last couple days. I'll try to wrap it up this week and unlick the cookie if that doesn't work out. (Feel free to unassign me if you want it done sooner than that.)

No worries :) Just taking inventory and checking on what we have outstanding.

Here's my progress so far.

Updated the wordlist page from the talk page (it's unclear what some of the template fields mean so ignored those)
cloned editquality and revscoring on stat1005 (need to use the proxy and HTTPS Github URLs), ran pip install -r requirements.txt for both (in hindsight pip install -e might have been simpler)
set up a venv, learned that the right way to do that on stat1005 is the virtualenv command (as opposed to python3 -m venv which will complain about missing packages and abort)
updated editquality makefile along these lines. The patch is not super helpful in figuring out what to do:
- it does not use $@/$</$^, which seems to be the standard today
- it uses different parameters for revscoring calls (that seems to have changed with T173202)
- there are a bunch of random-seeming numbers (pop-rate, again coming from T173202) which are different for every wiki
- it's from before the makefile was split into a manual and a templated part (although huwiki is not templated so that wasn't too confusing).

I omitted the pop-rate parameters, otherwise tried to ape the current frwiki params, and ran make huwiki_tuning_reports (on the assumption that this will tell me what to use for the seemingly-random parameters). That took about an hour to run, threw a bunch of non-fatal-looking errors (No revision was found for parameter "rvstartid", KeyError: 'user', TypeError: 'NoneType' object is not iterable - I assume these are revdelete-related; RuntimeWarning: overflow encountered in double_scalars, RuntimeWarning: invalid value encountered in multiply and a couple more like that; ValueError: cannot convert float NaN to integer, TypeError: only integer arrays with one element can be converted to an index), outputted something that looked halfway between a core dump and a roguelike map, and stopped due to user error (frwiki had different sample sizes for damaging, I didn't pay attention and just did a string search/replace on that wikiname and so the build step names did not match - annoyingly there seems to be no way to validate a makefile, even just to the extent of making sure all the dependencies exist). Full output is in P6714.

After fixing, I re-ran and got a zillion of errors like 2018-02-17 19:25:20,611 ERROR:editquality.utilities.merge_labels -- 17226861 has no labels, but was flagged for review and Could not cross-validate estimator RandomForest ... ModelConsistencyError: Labels {False} not in expected labels {True}. Other than that it seemed successful and created the tuning reports, but they are empty. Output: P6715 (a bit is missing from the beginning, was too long for scrollback).

Next I tried to build the models (make models/huwiki.damaging.gradient_boosting.model and make models/huwiki.goodfaith.gradient_boosting.model), that failed with __init__() got an unexpected keyword argument 'label_weights' and __init__() got an unexpected keyword argument 'population_rates' respectively. I tried to remove the label-weight lines from the makefile; that resulted in RuntimeError: Either --pop-rates or --labels or --labels-config must be specified.

The makefile change is up for inspection here.

Some comments

In T185903#3981268, @Tgr wrote:

Updated the wordlist page from the talk page (it's unclear what some of the template fields mean so ignored those)

Updating the page is not enough. You need to make a PR against revscoring and then get it merged and release it in pypi and then bump the version number in editquality requirements.txt

updated editquality makefile along these lines. The patch is not super helpful in figuring out what to do:

it does not use $@/$</$^, which seems to be the standard today

It has changed. Please take a look at the new Makefile

it uses different parameters for revscoring calls (that seems to have changed with T173202)

Overall, I highly recommend trying to template huwiki (T168455: [Epic] Implement code generation for model makefile maintenance) so you just make a yaml file, run a script and bam! The Makefile has changed.

there are a bunch of random-seeming numbers (pop-rate, again coming from T173202) which are different for every wiki

--pop-rate is percentage of true cases in the dataset of that model, grep and count the lines. I love to make it part of the templating script as it's already very confusing and prone to errors but hadn't have time to get around it.

it's from before the makefile was split into a manual and a templated part (although huwiki is not templated so that wasn't too confusing).

if you have some time, try to start with templating the huwiki without adding damaging and goodfaith (I think it's not very edge-casey wiki) and then adding damaging and goodfaith models will be rather easy.

Thanks, templating does indeed make it a lot easier!

In T185903#3983604, @Ladsgroup wrote:

Updating the page is not enough. You need to make a PR against revscoring and then get it merged and release it in pypi and then bump the version number in editquality requirements.txt

Tracked in T187753: Update Hungarian language assets.

--pop-rate is percentage of true cases in the dataset of that model, grep and count the lines.

How come it's such a long fraction for most wikis? With 5k datasets it shouldn't have more than 5 decimal places.

Second attempt:

updated config/manual_wikis.yaml, removed .tmp from the huwiki config file name
ran python utility generate_make --output=Makefile
compared with diff --color <(awk '/^###+/{flag=0}/^###+ Hungarian/{flag=1}flag' Makefile.manual) <(awk '/^###+/{flag=0}/^###+ Hungarian/{flag=1}flag' Makefile.manual) (the --compare-section parameter for generate_make did not seem to do anything useful). Result is in P6716; looks like nothing got lost.
counted true cases with python utility fetch_labels http://labels.wmflabs.org/campaigns/huwiki/33/ | grep '"goodfaith": true' | wc -l etc
ran make huwiki_tuning_reports (ran fine this time, had far less errors too: P6717)
updated the config per tuning recommendations
tried to train the models but all three aborted with json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0) (P6718)

Code here.

@Tgr, this should now be unblocked. We fixed the bug re. model params.

Thanks! The models are in PR #142.

Next step on the checklist is Test, cross-validate, review model health but it's not documented.

Tgr mentioned this in rOEQ09c4e33e8fe8: Add damaging and goodfaith for huwiki.Mar 13 2018, 4:51 PM

Tgr mentioned this in rOEQc01e88a12fb3: Tune huwiki.

Tgr mentioned this in rOEQb2c538b086fe: Build models for huwiki.

https://github.com/wiki-ai/revscoring/pull/395 I think we'll need this to properly handle getting human-labeled data.

Tgr mentioned this in rOEQd1be22de46d7: Add damaging and goodfaith for huwiki.Mar 19 2018, 8:54 AM

Tgr mentioned this in rOEQ28b31019c1e8: Tune huwiki.

Tgr mentioned this in rOEQ8abf153a1239: Build models for huwiki.

awight mentioned this in rOEQbc64941c3bf6: Add damaging and goodfaith for huwiki.Mar 21 2018, 8:57 PM

awight mentioned this in rOEQa8efb832dc44: Tune huwiki.

awight mentioned this in rOEQ5873f4fc371a: Build models for huwiki.

awight mentioned this in rOEQ163960febc09: Add damaging and goodfaith for huwiki.Mar 26 2018, 10:06 PM

awight mentioned this in rOEQ3444d4b76b36: Tune huwiki.

awight mentioned this in rOEQ0d5165afe76f: Build models for huwiki.

I've made merge_labels smarter, it should do the correct intersection between autolabeled and human_labeled. See my rebase and attempt to continue the work, https://github.com/wiki-ai/editquality/compare/awight-huwiki

This is currently running on ores-misc-01, I'll report back here with results.

Tgr mentioned this in rOEQ392db1fc7970: Tune huwiki.Mar 27 2018, 8:43 PM

Tgr mentioned this in rOEQ62cc2cbbce6e: Add damaging and goodfaith for huwiki.

Tgr mentioned this in rOEQc2a0b2758e6f: Build models for huwiki.

Tgr mentioned this in rOEQ0c2d1ea550b7: Add damaging and goodfaith for huwiki.Mar 28 2018, 8:33 PM

Tgr mentioned this in rOEQ87dcf5ba47ad: Tune huwiki.

Tgr mentioned this in rOEQ0a3dccf9ed5e: Build models for huwiki.

awight mentioned this in rOEQ994fab75ebb7: Add damaging and goodfaith for huwiki.Apr 2 2018, 11:32 PM

awight mentioned this in rOEQa27f07569c99: Tune huwiki.

awight mentioned this in rOEQf1d204e5efdd: Build models for huwiki.

awight mentioned this in rOEQd6735370ed63: Add damaging and goodfaith for huwiki.Apr 3 2018, 2:24 AM

awight mentioned this in rOEQ3efebe063474: Tune huwiki.

awight mentioned this in rOEQb508ccc62a1b: Build models for huwiki.

Halfak moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.Apr 9 2018, 3:46 PM

@Tgr Hi! We finally have a way forward, but I wanted to mention that we're a bit blocked by the PR's branch living in your forked repo. If you have time to rebase, that would be great, otherwise I'm happy to take over and push a new branch.

Tgr mentioned this in rOEQd6a685a6524b: Add damaging, goodfaith for huwiki, remove reverted.Apr 11 2018, 11:30 PM

Tgr mentioned this in rOEQc060c026c4e5: Tune huwiki.

Tgr mentioned this in rOEQ5d11a50b5d65: Build models for huwiki.

Tgr mentioned this in rOEQ941b06b42414: Build models for huwiki.Apr 12 2018, 8:41 AM

awight mentioned this in rOEQaf81a23ef6f5: Add damaging and goodfaith models for huwiki (#152).Apr 12 2018, 6:43 PM