This ticket is about computing "diff_decrease" (or delta decrease) credibility signal. Please refer to the diff.json and delta.json schemas under documentation for the exact field names.
Small ticket. A couple of lines of code to copy.
Given a dictionary of {token:values}. You will return the sum of negative values.
Input : {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
Output : -3
The following illustrates the concept of diff_decrease for dictionary_words. The same will apply to non_dictionary_words, non_safe words, informal and uppercase words.
If we have:
wikitext_parent = "kljlj 90 cat apple viking viking shoes cat apple apple viking viking" wikitext_current = ")()(. viking viking shoes dog apple apple viking viking shoes pypi"
We process these wikitexts using our utilities, and find the list of dictionary_words as follows:
dictionary_word_parent = ['cat', 'apple', 'viking', 'viking', 'shoes', 'cat', 'apple', 'apple', 'viking', 'viking', ] dictionary_word_current = ['dog', 'viking', 'viking', 'shoes', 'shoes', 'apple', 'apple', 'viking', 'viking', ]
Now, we use this utility, to get the frequency tables as follows:
old_ft = {'cat':2, 'apple':3, 'shoes':1, 'viking':4} # parent rev new_ft = {'apple':2, 'shoes':2, 'viking':4, 'dog':1} # current rev
A delta of two freq tables is a dictionary, where keys are tokens and values are the difference in freq between new_ft and old_ft. We use our delta table utility to get this.
For the above example:
delta_table = {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
Diff decrease is sum of negative values from delta_table.
For the above example:
diff_increase = -3
Please refer to revscoring repo.
The utility will live in structured-data/packages.