Page MenuHomePhabricator

Create A Utility To Generate A Proportional Delta of Two Revisions
Closed, ResolvedPublic5 Estimated Story Points

Description

Once we have a frequency table of tokens in the current revision and the parent revision, as well as the delta frequency table, we can generate a proportional delta. This will help us to compute the proportional increase/decrease diff credibility signals.

Small ticket. A couple of lines of code to copy.

A proportional delta table is a dictionary, where keys are tokens and values are calculated by dividing the corresponding delta_table values by previous freq table values.

The method will take old_ft table and delta table as input, produce prop_delta table as output.

For example, if:

Input:

old_ft = {'cat':2, 'apple':3, 'shoes':1, 'viking':4}
delta_table = {'apple': -1 , 'shoes': 1,  dog: 1, 'cat': -2}

Output:

prop_delta_table = {'apple': -1/3, 'shoes':1, 'cat': -1, dog:1}

Code to copy/paste:

def process(self, old_tf, ft_delta):
    prop_delta = {}
    for item, delta in ft_delta.items():
        if delta > 0:
            prop_delta[item] = delta / (old_tf.get(item, 0) + 1)
        else:
            prop_delta[item] = delta / old_tf[item]

    return prop_delta

The utility will live in structured-data/packages.
The ticket depends on completion of delta table ticket.

Event Timeline

Protsack.stephan changed the task status from Open to In Progress.Apr 22 2022, 3:03 PM
Protsack.stephan set the point value for this task to 5.
Lena.Milenko changed the status of subtask T299716: Add Diff Non-Dictionary Words Credibility Signal from Open to In Progress.
Lena.Milenko changed the status of subtask T299717: Add Diff Non-Safe Words Credibility Signal from Open to In Progress.
Lena.Milenko changed the status of subtask T299718: Add Diff Informal Words Credibility Signal from Open to In Progress.
Lena.Milenko changed the status of subtask T307116: Create A Utility To Get Proportional Decrease from Open to In Progress.
Lena.Milenko changed the task status from In Progress to Open.May 26 2022, 11:48 AM
Lena.Milenko changed the status of subtask T307116: Create A Utility To Get Proportional Decrease from In Progress to Open.
Lena.Milenko changed the status of subtask T307114: Create A Utility To Get Proportional Increase from In Progress to Open.
Lena.Milenko changed the status of subtask T299718: Add Diff Informal Words Credibility Signal from In Progress to Open.
Lena.Milenko changed the status of subtask T299716: Add Diff Non-Dictionary Words Credibility Signal from In Progress to Open.
Lena.Milenko changed the status of subtask T299717: Add Diff Non-Safe Words Credibility Signal from In Progress to Open.