[Spike] Look into error correcting output codes in SciKit Learn
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Jul 10 2015, 4:52 PM

Description

http://scikit-learn.org/stable/modules/multiclass.html

This task is done when we have an idea of how we can use scikit learn's utilities.

Event Timeline

Halfak created this task.Jul 10 2015, 4:52 PM

Halfak assigned this task to ToAruShiroiNeko.

Halfak raised the priority of this task from to Needs Triage.

Halfak updated the task description. (Show Details)

Halfak added a project: Machine-Learning-Team (Active Tasks).

Halfak moved this task to Paused on the Machine-Learning-Team (Active Tasks) board.

Halfak subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2015, 4:52 PM

Halfak moved this task from Paused to Backlog on the Machine-Learning-Team (Active Tasks) board.Jul 24 2015, 4:48 PM

Halfak moved this task from Backlog to Paused on the Machine-Learning-Team (Active Tasks) board.Jul 31 2015, 4:36 PM

ToAruShiroiNeko triaged this task as Medium priority.Aug 3 2015, 2:08 AM

ToAruShiroiNeko set Security to None.

ToAruShiroiNeko moved this task from Paused to Review on the Machine-Learning-Team (Active Tasks) board.Nov 6 2015, 6:21 PM

ToAruShiroiNeko moved this task from Review to Backlog on the Machine-Learning-Team (Active Tasks) board.

ToAruShiroiNeko moved this task from Backlog to Review on the Machine-Learning-Team (Active Tasks) board.Dec 31 2015, 8:26 AM

clf = OutputCodeClassifier(LinearSVC(random_state=0), code_size=2, random_state=0)

ECOC basically converts a single k class multi-class problem to (2^(k-1)-1) many binary class classification problems. This is useful because existing state of the art algorithms tend to be more heavily optimised for binary class classification and/or develop bias stemming from over-fitting etc.

OutputCodeClassifier will need to be fed a binary classifier. In the documentation LinearSVC was used but it would be best to try different binary classifiers through cross validation to see which algorithm fits which problem.

code_size mentioned here is the size of the ECOC code matrix which grows at a rate of (2^(k-1)-1). That would be 3 columns for a 3-class problem (which actually is a 1 vs all) and 63 columns for a 6-class problem. We will need to optimize as we go along.

Halfak moved this task from Review to Completed on the Machine-Learning-Team (Active Tasks) board.Jan 1 2016, 6:13 PM

Halfak closed this task as Resolved.Jan 21 2016, 3:43 PM

[Spike] Look into error correcting output codes in SciKit LearnClosed, ResolvedPublicActions

Description

Event Timeline

[Spike] Look into error correcting output codes in SciKit Learn
Closed, ResolvedPublic
Actions