Implement a basic scoring strategy for PCFGs
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Halfak
	Sep 21 2016, 10:35 PM

Description

This task is done when there's a python library that implements something that can score a sentence by it's likelihood of appearing in a corpus.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open		None	T144636 [Epic] Implement PCFG features for editquality and draftquality
		Resolved		Halfak	T146335 Implement a basic scoring strategy for PCFGs

Event Timeline

Halfak created this task.Sep 21 2016, 10:35 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 21 2016, 10:35 PM

Halfak updated the task description. (Show Details)Sep 21 2016, 10:48 PM

https://github.com/halfak/kasami I produced this largely from reviewing and simplifying code found in https://github.com/aetilley/pcfg. Most of my notes are in T144636.

I'd like to see if @aetilley has time to review the score() method here: https://github.com/halfak/kasami/blob/master/kasami/tree_scorer.py#L29

Here's a copy-paste of the relevant lines of code:

probas = [self.prod_freq.get(prod, 0.5) /
          self.source_freq.get(prod.source, 1)
          for prod in tree]
return sum(log(proba) for proba in probas)

Essentially, probas == the frequency of the production / the frequency of the source. If the production has not been seen before, it is given a frequency of 0.5 (so that we avoid zero probabilities). Similarly, if the source has not been seen before, it is given a frequency of 1. The sum of log(proba) is returned so that we don't get into precision issues. You can always convert back to raw likelihood by using exp().

>>> 1 * 10 ** -19 + 1
1.0
>>> 1 * 10 ** -18 + 1
1.0
>>> 1 * 10 ** -17 + 1
1.0
>>> 1 * 10 ** -16 + 1
1.0
>>> 1 * 10 ** -15 + 1
1.000000000000001

Halfak claimed this task.Sep 21 2016, 11:25 PM

Halfak moved this task from Parked to Review on the Machine-Learning-Team (Active Tasks) board.

Halfak added a parent task: T144636: [Epic] Implement PCFG features for editquality and draftquality.Sep 22 2016, 7:25 PM

Halfak moved this task from Review to Completed on the Machine-Learning-Team (Active Tasks) board.Sep 26 2016, 4:04 PM

Halfak closed this task as Resolved.Sep 28 2016, 9:40 PM

Implement a basic scoring strategy for PCFGsClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Implement a basic scoring strategy for PCFGs
Closed, ResolvedPublic
Actions

Related Objects
Search...