[Spike] Semi-supervised machine learning
Open, LowPublicSpike
Actions

Assigned To

None

Authored By

	Halfak
	Aug 16 2016, 4:45 PM

Description

Pattern is roughly:

Label small random sample
Train model
Make predictions on new data -- auto-label confident observations
GOTO 2

This task is done when we experiment with training a model and comparing against a (labeled) test set. We'll need a solid testing strategy.

Related Objects

Mentioned Here: T128087: [Spike] Investigate HashingVectorizer
T132580: Implement abstraction for Sparse Feature Vectors

Event Timeline

Halfak created this task.Aug 16 2016, 4:45 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2016, 4:45 PM

Halfak renamed this task from Semi-supervised machine learning to [Spike] Semi-supervised machine learning.Aug 16 2016, 4:48 PM

Halfak added a project: Spike.

Halfak updated the task description. (Show Details)

This will likely be especially useful when we have large feature vectors implemented (T132580) and we start working with hashing vectorization in the wild (T128087).

Halfak triaged this task as Low priority.Aug 18 2016, 2:30 PM

Halfak moved this task from Ideas to Research & analysis on the Machine-Learning-Team board.Sep 22 2016, 2:54 PM

Halfak moved this task from Research & analysis to Ideas on the Machine-Learning-Team board.Sep 22 2016, 2:59 PM

Sabya subscribed.Oct 27 2016, 4:10 AM

I talked to @Sabya in IRC. Here's the steps that I recommended.

Read up on methods.
Take our labeled data for damaging/not and split into train/test set
Build model on training set.
Run model against a random sample of revisions and take the revisions that are strongly scored (high confidence of "damaging"/not)
Train a new model on the training set + the strongly-labeled observations.
Test against the test set and see if we do better.

@Halfak which classifier algorithm should I use? Current production algorithms or HashingVector + GradientBoosting?

If it's easy to do so, I'd say "both".

Halfak added a project: artificial-intelligence.Jan 20 2017, 8:39 PM

Harej moved this task from Ideas to New development on the Machine-Learning-Team board.Apr 3 2019, 4:47 AM

• ACraze moved this task from New development to Backlog/Other on the Machine-Learning-Team board.Jan 19 2021, 10:38 PM

Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptJan 19 2021, 10:38 PM

[Spike] Semi-supervised machine learningOpen, LowPublicSpikeActions

Description

Related Objects

Event Timeline

[Spike] Semi-supervised machine learning
Open, LowPublicSpike
Actions