To know if a certain word token increased or decreased between the two revisions, we need to know the frequency of that word in the two revisions. This utility will generate the freq. table.
[[ https://github.com/wikimedia/revscoring/blob/773f9cd8029de7ef5c7713addd2f6661bce94b4e/revscoring/datasources/meta/frequencies.py#L35 | Here ]] is the code snippet to copy/paste.
```
freq = {}
for item in items:
if item in freq:
freq[item] += 1
else:
freq[item] = 1
return freq
```
This utility/method will live under structured-data/packages.
//frequency(list of tokens) -> dict of {'token':freq}//
The method will take list of tokens of a kind (can be list of word, list of numbers). The method will output a dictionary with keys as tokens and values as the frequency.
Example:
Input: ['apple', 'cat', 'shoes', 'cat', 'apple', 'apple']
Output: {'cat':2, 'apple':3, 'shoes':1}