Page MenuHomePhabricator

Create A Utility To Compute Diff Increase
Closed, ResolvedPublic3 Estimated Story Points

Description

This ticket is about computing "diff_increase" (or delta increase) credibility signal. Please refer to the diff.json and delta.json schemas under documentation for the exact field names.

Small ticket. A couple of lines of code to copy.
Given a dictionary of {token:values}. You will return the sum of positive values.

Input : {'apple': -1 , 'shoes': 1, dog: 1, 'cat': -2}
Output : 2

The following illustrates the concept of diff_increase: for dictionary_words. The same will apply to non_dictionaary_words, non_safe words, informal and uppercase words.

If we have:

wikitext_parent = "kljlj 90 cat apple viking viking         shoes    cat apple apple viking viking"
wikitext_current = ")()(. viking viking         shoes    dog apple apple viking viking shoes pypi"

We process these wikitexts using our utilities, and find the list of dictionary_words as follows:

dictionary_word_parent = ['cat', 'apple', 'viking', 'viking',  'shoes',    'cat',  'apple',  'apple',  'viking', 'viking', ]
dictionary_word_current = ['dog',  'viking', 'viking',  'shoes',   'shoes',  'apple',  'apple',  'viking', 'viking', ]

Now, we use this utility, to get the frequency tables as follows:

old_ft = {'cat':2, 'apple':3, 'shoes':1, 'viking':4}   # parent rev 
new_ft = {'apple':2, 'shoes':2, 'viking':4, 'dog':1}    # current rev

A delta of two freq tables is a dictionary, where keys are tokens and values are the difference in freq between new_ft and old_ft. We use our delta table utility to get this.

For the above example:

delta_table = {'apple': -1 , 'shoes': 1,  dog: 1, 'cat': -2}

Diff increase is sum of positive values from delta_table.

For the above example:

diff_increase = 1+1 =2

Please refer to this.

The utility will live in structured-data/packages.

Event Timeline

prabhat updated the task description. (Show Details)
Protsack.stephan moved this task from Incoming to In Progress on the Wikimedia Enterprise board.
Lena.Milenko changed the task status from Open to In Progress.Mar 31 2022, 1:41 PM
Lena.Milenko changed the task status from In Progress to Open.May 26 2022, 11:48 AM