Develop a method for gathering spam sentence for training a PCFG model.
Naive thoughts on strategies:
- Parse articles deleted for spam (WP:G11)
- Parse sentence touched in edits reverted for spam/advertising (match edit comment)
- Use human curation to clean up sample.