Stats: http://labels.wmflabs.org/stats/huwiki/
Contact: @Tgr
- Announce the labeling campaign on huwiki
- Status update #1
- Status update #2
Stats: http://labels.wmflabs.org/stats/huwiki/
Contact: @Tgr
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | Catrope | T192496 Deploy ORES advanced editquality models to huwiki | |||
Resolved | Tgr | T185903 Train/test damaging and goodfaith model for Hungarian Wikipedia | |||
Resolved | Halfak | T167968 Complete edit quality campaign for Hungarian Wikipedia |
This is now seeing steady progress (30% in the last couple weeks), thanks to @Misibacsi. Feedback from the local discussion:
Bot edits should not be included in the dataset. Is it possible that some bots that are not flagged as bots are showing up?
WRT showing the name, we've avoided that on purpose because there's measurable biases against anonymous editors. We've kind of felt that was was best left as WONT_FIX. What do you think?
No, they are proper bots. Example edit (task id 317779), user rights log.
WRT showing the name, we've avoided that on purpose because there's measurable biases against anonymous editors. We've kind of felt that was was best left as WONT_FIX. What do you think?
Yeah, I guessed as much. I don't have a strong feeling either way - labelers will probably look up nontrivial edits anyway because it's a lot of effort to verify an unsourced fact change so it's easier to rely on the author's reputation or reaction of the reviewers.
I've found the problem! huwiki is one of the datasets where we mixed edits that seem to "need review" with those that don't so that we can check our assumption. See the line in our makefile here: https://github.com/wiki-ai/editquality/blob/master/Makefile#L2136
More recently, we've discarded this strategy as it seems clear that our "needs review" filters are working as intended.
Option 1: Continue as-is and get labels for *some* of the edits that do not "need review"
Option 2: Pull all unlabeled edits that do not "need review" and work with the remaining observations.
Given how close the campaign is to finishing, I think we should continue with option #1. But, I'd be OK with option #2 if someone felt strongly.
26 revisions not found out of 26, twice in a row. I think it’s done, and the remaining revisions should be removed manually from the database (or replaced by accessible ones).
Confirmed that this is done! Thanks for your work. I'll get us moving on the next step.