Page MenuHomePhabricator

[Spike] Look into error correcting output codes in SciKit Learn
Closed, ResolvedPublic

Description

http://scikit-learn.org/stable/modules/multiclass.html

This task is done when we have an idea of how we can use scikit learn's utilities.

Event Timeline

Halfak assigned this task to ToAruShiroiNeko.
Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Paused on the Machine-Learning-Team (Active Tasks) board.
Halfak subscribed.
ToAruShiroiNeko set Security to None.

clf = OutputCodeClassifier(LinearSVC(random_state=0), code_size=2, random_state=0)

ECOC basically converts a single k class multi-class problem to (2^(k-1)-1) many binary class classification problems. This is useful because existing state of the art algorithms tend to be more heavily optimised for binary class classification and/or develop bias stemming from over-fitting etc.

OutputCodeClassifier will need to be fed a binary classifier. In the documentation LinearSVC was used but it would be best to try different binary classifiers through cross validation to see which algorithm fits which problem.

code_size mentioned here is the size of the ECOC code matrix which grows at a rate of (2^(k-1)-1). That would be 3 columns for a 3-class problem (which actually is a 1 vs all) and 63 columns for a 6-class problem. We will need to optimize as we go along.