Analyze the extent of the bias of damage detection models against anons
Closed, ResolvedPublic
Actions

Assigned To

None

Authored By

	Halfak
	Apr 3 2016, 8:03 AM

Description

See http://socio-technologist.blogspot.com/2015/12/disparate-impact-of-damage-detection-on.html

Extend this work by exploring the extent of the bias that ORES learns.

Event Timeline

Halfak created this task.Apr 3 2016, 8:03 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 3 2016, 8:03 AM

Halfak edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.Apr 3 2016, 8:03 AM

Halfak moved this task from Parked to Backlog on the Machine-Learning-Team (Active Tasks) board.

Halfak updated the task description. (Show Details)Apr 5 2016, 5:37 PM

Halfak edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Active Tasks).Jun 14 2016, 5:25 PM

Halfak moved this task from Unsorted to Ideas on the Machine-Learning-Team board.

Halfak moved this task from Ideas to Research & analysis on the Machine-Learning-Team board.Sep 22 2016, 2:58 PM

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptAug 3 2018, 5:31 PM

@Halfak I don't understand whether the change made in your blog post was ever implemented. There's https://github.com/wikimedia/editquality/commit/35da6cea60fc4250ef3afea436895ddddaee1e65 which seems preparatory, but then reverted and damaging models still have the is_anon feature.

saurabhbatra96 subscribed.Oct 24 2018, 5:01 PM

awight removed Avner as the assignee of this task.Oct 24 2018, 5:11 PM

awight added a subscriber: Avner.

Right. So there's an analysis in https://commons.wikimedia.org/wiki/File:ORES_-_Facilitating_re-mediation_of_Wikipedia%27s_socio-technical_problems.pdf

That shows the bias against anons and how we mitigated some of the issue by switching to a new modeling strategy. It's not clear whether it is worthwhile to further work is necessary, but I think we can consider this task to be done.

@Halfak Thanks for the pointer to ... a paper my name is on. I see what you're talking about, in section 7.4. Switching from SVM to gradient boosting apparently made a huge improvement, but hasn't made the problem go away. Do you think there's any value in continuing this investigation, for example quantitizing how much our algorithm relies on is_anon and how a model would perform if trained without that feature?

Right. So I think we might file a new task for that work. I think that we will need a community consultation of some sort to make a decision about the inclusion of is_anon and seconds_since_registration. There will be a tradeoff in model fitness. I've been talking to some researchers about potentially picking that task up. They want to study the process of intersecting algorithmic parameters with people's values. Ping @Bobo.03 :)

Halfak edited projects, added Machine-Learning-Team (Research); removed Machine-Learning-Team.Apr 2 2019, 9:33 PM

Restricted Application edited projects, added Machine-Learning-Team; removed Machine-Learning-Team (Research). · View Herald TranscriptApr 2 2019, 9:33 PM

Harej edited projects, added Machine-Learning-Team (Research); removed Machine-Learning-Team.Apr 3 2019, 4:33 AM

Harej closed this task as Resolved.Apr 9 2019, 9:08 PM

Analyze the extent of the bias of damage detection models against anonsClosed, ResolvedPublicActions

Description

Event Timeline

Analyze the extent of the bias of damage detection models against anons
Closed, ResolvedPublic
Actions