Page MenuHomePhabricator

Create A Utility To Compute Diff Decrease
Closed, ResolvedPublic3 Estimated Story Points

Description

This ticket is about computing "diff_decrease" (or delta decrease) credibility signal. Please refer to the diff.json and delta.json schemas under documentation for the exact field names.

Small ticket. A couple of lines of code to copy.
Given a dictionary of {token:values}. You will return the sum of negative values.

Input : {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
Output : -3

The following illustrates the concept of diff_decrease for dictionary_words. The same will apply to non_dictionary_words, non_safe words, informal and uppercase words.

If we have:

wikitext_parent = "kljlj 90 cat apple viking viking         shoes    cat apple apple viking viking"
wikitext_current = ")()(. viking viking         shoes    dog apple apple viking viking shoes pypi"

We process these wikitexts using our utilities, and find the list of dictionary_words as follows:

dictionary_word_parent = ['cat', 'apple', 'viking', 'viking',  'shoes',    'cat',  'apple',  'apple',  'viking', 'viking', ]
dictionary_word_current = ['dog',  'viking', 'viking',  'shoes',   'shoes',  'apple',  'apple',  'viking', 'viking', ]

Now, we use this utility, to get the frequency tables as follows:

old_ft = {'cat':2, 'apple':3, 'shoes':1, 'viking':4}   # parent rev 
new_ft = {'apple':2, 'shoes':2, 'viking':4, 'dog':1}    # current rev

A delta of two freq tables is a dictionary, where keys are tokens and values are the difference in freq between new_ft and old_ft. We use our delta table utility to get this.

For the above example:

delta_table = {'apple': -1 , 'shoes': 1,  dog: 1, 'cat': -2}

Diff decrease is sum of negative values from delta_table.

For the above example:

diff_increase = -3

Please refer to revscoring repo.

The utility will live in structured-data/packages.

Event Timeline

Protsack.stephan moved this task from Incoming to In Progress on the Wikimedia Enterprise board.
Lena.Milenko changed the task status from Open to In Progress.Mar 31 2022, 1:41 PM
Lena.Milenko changed the task status from In Progress to Open.May 26 2022, 11:48 AM