Page MenuHomePhabricator

Analyze differentiation of FA, Spam, Vandalism, and Attack models/sentences.
Closed, ResolvedPublic

Description

How do sentences from Spam, Vandalism and Attack articles score in the FA sentence model? How do FA sentences score in the Spam, Vandalism and Attack models?

This task is done when a basic analysis shows how well the models are able to differentiate sentences trained from the 4 grammars.

Event Timeline

Halfak created this task.Nov 28 2016, 9:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 28 2016, 9:28 PM

I have some progress to show here, but I found an issue in the spacy parser. So I'm working on that first.

Done here https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_draft_quality/Work_log/2016-12-01

TL;DR: We don't differentiate that well when it comes to vandalism but we do a pretty good job for spam and attack sentences. Looking at the dataset qualitatively. I think that we'll want to look at the data qualitatively next. E.g. what are the most vandalismy sentences in :en:Biology? Or what are the most FA sentences in some of the deleted articles?

Halfak added a comment.Dec 3 2016, 8:35 PM

https://meta.wikimedia.org/wiki/Research_talk:Automated_classification_of_draft_quality/Work_log/2016-12-03

OK all done for today. It looks like we're doing a good job of differentiating sentences. I think we're just about ready to start experimenting with live data.

It might even be time to start experimenting with editquality, articlequality, and draftquality.

Halfak closed this task as Resolved.Feb 7 2017, 8:31 PM