Implement ~100 most important hash vector features in editquality models
Open, LowPublic
Actions

Assigned To

None

Authored By

	Halfak
	Sep 15 2016, 6:47 PM

Description

This task is done when a revscoring scorer model is trained and cross-validated that includes 100 hashed gram features.

108 features was discovered to be most effective in T128087

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T145812 Implement ~100 most important hash vector features in editquality models
		Resolved	Spike	Sabya	T128087 [Spike] Investigate HashingVectorizer

Event Timeline

Halfak created this task.Sep 15 2016, 6:47 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 15 2016, 6:47 PM

Halfak mentioned this in T128087: [Spike] Investigate HashingVectorizer.Sep 15 2016, 6:49 PM

So, I've been thinking that we might want to discover our high utility hash vector using a larger analysis of reverted edits and then use that to train a model on the damaging/goodfaith models.

In T128087, we used the highest "importance" hashes as learned by a GradientBoosting model. We could stick with that strategy or try out a TFiDF weight-selection strategy.

Halfak triaged this task as Low priority.Sep 22 2016, 2:45 PM

Halfak added a subtask: T128087: [Spike] Investigate HashingVectorizer.

Halfak moved this task from Unsorted to New development on the Machine-Learning-Team board.

Halfak closed subtask T128087: [Spike] Investigate HashingVectorizer as Resolved.Sep 22 2016, 5:51 PM

Halfak mentioned this in T157222: Estimate ORES capex for FY2017-18.Feb 5 2017, 4:23 PM

He7d3r subscribed.Apr 18 2020, 6:55 PM

Restricted Application added a project: artificial-intelligence. · View Herald TranscriptApr 18 2020, 6:55 PM

Maintenance_bot moved this task from New development to Backlog/Revscoring on the Machine-Learning-Team board.Jan 19 2021, 11:37 PM

Implement ~100 most important hash vector features in editquality modelsOpen, LowPublicActions

Description

Related ObjectsSearch...

Event Timeline

Implement ~100 most important hash vector features in editquality models
Open, LowPublic
Actions

Related Objects
Search...