Develop manual testing strategy for bias detection
Open, LowPublic
Actions

Assigned To

None

Authored By

	Halfak
	Nov 2 2015, 3:00 PM

Description

To complement our unsupervised strategy for learning clusters of damaging/not-damaging edits, we should also experiment with looking for bias in places we think are likely.

In order to do this, I image us withholding a random sample during revscoring train_test and testing the fitness of the model on interesting sub-sets. E.g. edits by newcomers, edits by anonymous users, etc.

We'll likely want to automate this because training and testing a model takes a long time. We'll also need to create a utility for running a new test-set through a pre-trained model.

This card is done when we have a plan for implementing a manual bias detection strategy and the specific tasks have been created.

Event Timeline

Halfak created this task.Nov 2 2015, 3:00 PM

Halfak raised the priority of this task from to Needs Triage.

Halfak updated the task description. (Show Details)

Halfak added a project: Machine-Learning-Team (Active Tasks).

Halfak moved this task to Parked on the Machine-Learning-Team (Active Tasks) board.

Halfak subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 2 2015, 3:00 PM

Ladsgroup subscribed.Nov 2 2015, 3:04 PM

I'm imagining that we'd so something like this:

$ wc rev_features.tsv -l
20000
$ shuf rev_features.tsv > rev_features.shuffled.tsv
$ head -n 15000 rev_features.tsv > rev_features.train_set.tsv
$ tail -n+15000 rev_features.tsv > rev_features.test_set.tsv
$ cat rev_features.train_set.tsv | \
> revscoring train \
> revscoring.scorer_model.LinearSVC  \
>   editquality.feature_lists.enwiki.damaging \
>   --label-type=bool > \
> my_model.linear_svc.model
Accuracy: 0.692
ROC-AUC: 0.891

Table:
...
$ cat rev_features.test_set.tsv | \
> revscoring filter_features \
>   editquality.feature_lists.enwiki.damaging \
>   --exclude-all \
>   --include 'user.is_anon == True' | \
> revscoring test | \
>   my_model.linear_svc.model
Observations: 5000
Filtered observations: 2200

Accuracy: 0.763
ROC-AUC:  0.750

Table:
...

In this example, I'm imagining that we'd refactor train_test into two utilities: train and test. train would withhold testing data and apply the test unless told not to (like train_test does now). We'd also need a powerful way to filter feature sets. I've imagined the filter_features utility that would take --include and --exclude arguments and parse them to grab the feature with the corresponding name and apply the filter.