See https://en.wikipedia.org/wiki/Stochastic_context-free_grammar
Implement a feature that, when given a set of sentences, produces a likelihood ratio that represents how likely it is that a sentence is to be generated from a subset of a corpus (e.g. "vandalism" or "spam", "featured", "attack", etc.).
Scoring library: https://github.com/halfak/kasami
Sentence models: https://github.com/wiki-ai/wikigrammars