@Halfak I don't understand whether the change made in your blog post was ever implemented. There's https://github.com/wikimedia/editquality/commit/35da6cea60fc4250ef3afea436895ddddaee1e65 which seems preparatory, but then reverted and damaging models still have the is_anon feature.
Right. So there's an analysis in https://commons.wikimedia.org/wiki/File:ORES_-_Facilitating_re-mediation_of_Wikipedia%27s_socio-technical_problems.pdf
That shows the bias against anons and how we mitigated some of the issue by switching to a new modeling strategy. It's not clear whether it is worthwhile to further work is necessary, but I think we can consider this task to be done.
@Halfak Thanks for the pointer to ... a paper my name is on. I see what you're talking about, in section 7.4. Switching from SVM to gradient boosting apparently made a huge improvement, but hasn't made the problem go away. Do you think there's any value in continuing this investigation, for example quantitizing how much our algorithm relies on is_anon and how a model would perform if trained without that feature?
Right. So I think we might file a new task for that work. I think that we will need a community consultation of some sort to make a decision about the inclusion of is_anon and seconds_since_registration. There will be a tradeoff in model fitness. I've been talking to some researchers about potentially picking that task up. They want to study the process of intersecting algorithmic parameters with people's values. Ping @Bobo.03 :)