|Open||None||T145812 Implement ~100 most important hash vector features in editquality models|
|Resolved||Spike||Sabya||T128087 [Spike] Investigate HashingVectorizer|
So, I've been thinking that we might want to discover our high utility hash vector using a larger analysis of reverted edits and then use that to train a model on the damaging/goodfaith models.
In T128087, we used the highest "importance" hashes as learned by a GradientBoosting model. We could stick with that strategy or try out a TFiDF weight-selection strategy.