Page MenuHomePhabricator

[Spike] Look into error correcting output codes in SciKit Learn
Closed, ResolvedPublic

Description

http://scikit-learn.org/stable/modules/multiclass.html

This task is done when we have an idea of how we can use scikit learn's utilities.

Event Timeline

Halfak created this task.Jul 10 2015, 4:52 PM
Halfak updated the task description. (Show Details)
Halfak raised the priority of this task from to Needs Triage.
Halfak assigned this task to ToAruShiroiNeko.
Halfak moved this task to Paused on the Scoring-platform-team (Current) board.
Halfak added a subscriber: Halfak.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 10 2015, 4:52 PM
ToAruShiroiNeko triaged this task as Normal priority.Aug 3 2015, 2:08 AM
ToAruShiroiNeko set Security to None.
ToAruShiroiNeko added a comment.EditedDec 31 2015, 8:46 AM

clf = OutputCodeClassifier(LinearSVC(random_state=0), code_size=2, random_state=0)

ECOC basically converts a single k class multi-class problem to (2^(k-1)-1) many binary class classification problems. This is useful because existing state of the art algorithms tend to be more heavily optimised for binary class classification and/or develop bias stemming from over-fitting etc.

OutputCodeClassifier will need to be fed a binary classifier. In the documentation LinearSVC was used but it would be best to try different binary classifiers through cross validation to see which algorithm fits which problem.

code_size mentioned here is the size of the ECOC code matrix which grows at a rate of (2^(k-1)-1). That would be 3 columns for a 3-class problem (which actually is a 1 vs all) and 63 columns for a 6-class problem. We will need to optimize as we go along.

Halfak closed this task as Resolved.Jan 21 2016, 3:43 PM