This ticket is about computing "diff_decrease" (or delta decrease) credibility signal. Please refer to the diff.json and delta.json schemas under documentation for the exact field names.
**//Small ticket. A couple of lines of code to copy.//**
Given a dictionary of {token:values}. You will return the sum of negative values.
Input : {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
Output : -3
The following illustrates the concept of diff_decrease for dictionary_words. The same will apply to non_dictionary_words, non_safe words, informal and uppercase words.
If we have:
```
wikitext_parent = "kljlj 90 cat apple viking viking shoes cat apple apple viking viking"
wikitext_current = ")()(. viking viking shoes dog apple apple viking viking shoes pypi"
```
We process these wikitexts [[ https://phabricator.wikimedia.org/T299432 | using our utilities ]], and find the list of dictionary_words as follows:
```
dictionary_word_parent = ['cat', 'apple', 'viking', 'viking', 'shoes', 'cat', 'apple', 'apple', 'viking', 'viking', ]
dictionary_word_current = ['dog', 'viking', 'viking', 'shoes', 'shoes', 'apple', 'apple', 'viking', 'viking', ]
```
Now, we use [[ https://phabricator.wikimedia.org/T299592 | this utility ]], to get the frequency tables as follows:
```
old_ft = {'cat':2, 'apple':3, 'shoes':1, 'viking':4} # parent rev
new_ft = {'apple':2, 'shoes':2, 'viking':4, 'dog':1} # current rev
```
A delta of two freq tables is a dictionary, where keys are tokens and values are the difference in freq between new_ft and old_ft. We use our [[ https://phabricator.wikimedia.org/T299593 | delta table utility ]] to get this.
For the above example:
delta_table = {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
**//Diff decrease is sum of negative values from delta_table.//**
For the above example:
diff_increase = -3
Please refer to [[ https://github.com/wikimedia/revscoring/blob/773f9cd8029de7ef5c7713addd2f6661bce94b4e/revscoring/languages/features/dictionary/features.py#L69 | this ]].