Page MenuHomePhabricator

Create A Utility To Compute Diff Sum
Closed, ResolvedPublic5 Estimated Story Points

Description

This ticket is about computing "sum" (diff sum or delta sum) credibility signal. Please refer to the diff.json and delta.json schemas under documentation for the exact field names.

Small ticket. A couple of lines of code to copy.

Given a dictionary of {token:values}. You will return the sum of values.

Input : {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
Output : -3

The following illustrates the concept of diff_sum for dictionary_words. The same will apply to non_dictionary_words, non_safe words, informal and uppercase words.

If we have:

wikitext_parent = "kljlj 90 cat apple viking viking         shoes    cat apple apple viking viking"
wikitext_current = ")()(. viking viking         shoes    dog apple apple viking viking shoes pypi"

We process these wikitexts using our utilities, and find the list of dictionary_words as follows:

dictionary_word_parent = ['cat', 'apple', 'viking', 'viking',  'shoes',    'cat',  'apple',  'apple',  'viking', 'viking', ]
dictionary_word_current = ['dog',  'viking', 'viking',  'shoes',   'shoes',  'apple',  'apple',  'viking', 'viking', ]

Now, we use this utility, to get the frequency tables as follows:

old_ft = {'cat':2, 'apple':3, 'shoes':1, 'viking':4}   # parent rev 
new_ft = {'apple':2, 'shoes':2, 'viking':4, 'dog':1}    # current rev

A delta of two freq tables is a dictionary, where keys are tokens and values are the difference in freq between new_ft and old_ft. We use our delta table utility to get this.

For the above example:

delta_table = {'apple': -1 , 'shoes': 1,  dog: 1, 'cat': -2}

Diff sum is sum of values from delta_table.

For the above example:

diff_sum = -1

Please refer to revscoring repo.

The utility will live in structured-data/packages.

Event Timeline

Protsack.stephan changed the task status from Open to In Progress.Apr 22 2022, 3:03 PM
Protsack.stephan set the point value for this task to 5.
Lena.Milenko changed the task status from In Progress to Open.May 26 2022, 11:48 AM