Page MenuHomePhabricator

Complete beta version of pcfg_scorer and approximate overhead
Closed, ResolvedPublic


Complete beta version of pcfg_scorer and approximate size of

  1. Pickled PCFGScorer object
  2. CountFiles used to adequately train PCFGScorer objects.

Event Timeline

aetilley created this task.Dec 11 2015, 7:06 PM
aetilley claimed this task.
aetilley raised the priority of this task from to Needs Triage.
aetilley updated the task description. (Show Details)
aetilley added subscribers: aetilley, Halfak, Ladsgroup.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptDec 11 2015, 7:06 PM

PCFG object beta complete

Object has both a parser and a scorer.

Proposed future strategy:

Use penn treebank or some large public treebank to train generic PCFG *parser*. In particular, need to get a counts file like

Use our trained parser to parse WP revisions (regular and vandalous) in order to get *two more counts* files to train two PCFG *scorers* p_vandal and p_regular. Looks like the tokenizer will be straightforward (thanks Aaron:

Add features, for, say

min_{s \in revision}(log(p_{vandal}(s))) - min_{s \in revision}(log(p_{regular}(s)))

Halfak closed this task as Resolved.Jan 21 2016, 3:42 PM
Halfak set Security to None.