# Implement a basic scoring strategy for PCFGsClosed, ResolvedPublicActions

Assigned To
 Halfak
Authored By
 Halfak Sep 21 2016, 10:35 PM2016-09-21 22:35:58 (UTC+0)
Referenced Files
None
Subscribers

# Description

This task is done when there's a python library that implements something that can score a sentence by it's likelihood of appearing in a corpus.

### Event Timeline

Restricted Application added a subscriber: Aklapper. Sep 21 2016, 10:35 PM

https://github.com/halfak/kasami I produced this largely from reviewing and simplifying code found in https://github.com/aetilley/pcfg. Most of my notes are in T144636.

I'd like to see if @aetilley has time to review the score() method here: https://github.com/halfak/kasami/blob/master/kasami/tree_scorer.py#L29

Here's a copy-paste of the relevant lines of code:

```probas = [self.prod_freq.get(prod, 0.5) /
self.source_freq.get(prod.source, 1)
for prod in tree]
return sum(log(proba) for proba in probas)```

Essentially, probas == the frequency of the production / the frequency of the source. If the production has not been seen before, it is given a frequency of 0.5 (so that we avoid zero probabilities). Similarly, if the source has not been seen before, it is given a frequency of 1. The sum of log(proba) is returned so that we don't get into precision issues. You can always convert back to raw likelihood by using exp().

```>>> 1 * 10 ** -19 + 1
1.0
>>> 1 * 10 ** -18 + 1
1.0
>>> 1 * 10 ** -17 + 1
1.0
>>> 1 * 10 ** -16 + 1
1.0
>>> 1 * 10 ** -15 + 1
1.000000000000001```