Page MenuHomePhabricator

Sentence bank for spam
Closed, ResolvedPublic

Description

Develop a method for gathering spam sentence for training a PCFG model.

Naive thoughts on strategies:

  • Parse articles deleted for spam (WP:G11)
  • Parse sentence touched in edits reverted for spam/advertising (match edit comment)
  • Use human curation to clean up sample.

Event Timeline

Halfak created this task.Oct 13 2016, 1:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 13 2016, 1:37 PM
Halfak renamed this task from Generate sentence banks for English Language spammy sentences to Sentence banks for English Language spammy sentences.Oct 13 2016, 1:44 PM
Halfak renamed this task from Sentence banks for English Language spammy sentences to Sentence banks for English Language spam sentences.
Halfak renamed this task from Sentence banks for English Language spam sentences to Sentence banks for spam.Oct 13 2016, 1:48 PM
Halfak updated the task description. (Show Details)
Halfak renamed this task from Sentence banks for spam to Sentence bank for spam.Oct 13 2016, 1:52 PM
Halfak triaged this task as Normal priority.Oct 13 2016, 2:55 PM
Halfak moved this task from Untriaged to Research & analysis on the Scoring-platform-team board.
Halfak claimed this task.Jan 19 2017, 6:30 PM
Halfak moved this task from Active to Done on the Scoring-platform-team (Current) board.
Halfak closed this task as Resolved.Feb 7 2017, 8:31 PM