Page MenuHomePhabricator

Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences.
Closed, ResolvedPublic


How do sentences from Spam, Vandalism and Attack articles score in the FA sentence model? How do FA sentences score in the Spam, Vandalism and Attack models?

This task is done when a basic analysis shows how well the models are able to differentiate sentences trained from the 4 grammars.

Event Timeline

I have some progress to show here, but I found an issue in the spacy parser. So I'm working on that first.

Done here

TL;DR: We don't differentiate that well when it comes to vandalism but we do a pretty good job for spam and attack sentences. Looking at the dataset qualitatively. I think that we'll want to look at the data qualitatively next. E.g. what are the most vandalismy sentences in :en:Biology? Or what are the most FA sentences in some of the deleted articles?

OK all done for today. It looks like we're doing a good job of differentiating sentences. I think we're just about ready to start experimenting with live data.

It might even be time to start experimenting with editquality, articlequality, and draftquality.