Page MenuHomePhabricator

BUG: Fraud widget numbers don't seem right
Closed, ResolvedPublic0 Story Points

Description

The Fraud widget in the dash is displaying abnormally high numbers. January 27 looking at the Dash gave 11.2%, but looking at fredge data 5.1%.

Event Timeline

atgo raised the priority of this task from to High.
atgo updated the task description. (Show Details)
atgo added a project: Fundraising Dash.
atgo added subscribers: atgo, Ppena, CCogdill_WMF.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 28 2015, 10:36 PM
atgo moved this task from Backlog to Bugs on the Fundraising Dash board.Jan 28 2015, 10:36 PM
This comment was removed by atgo.

@Ppena could you please add more info here?

Ppena added a comment.Feb 4 2015, 11:41 PM

Not sure what to add, but every time I got and check the Fraud% of a country on the Frau widget it always shows a higher % than our SQL reports.

some examples, please

Ppena added a comment.Feb 5 2015, 12:54 AM

sorry, i dont think i have them. Everytime i looked and confirmed that it was wrong with either you or Caitlin, I just recorded the correct % on my tracking sheets. A good idea will be for the Tech team to query something at the same time in the widget and run a sql to see how different that it.

atgo updated the task description. (Show Details)Feb 7 2015, 12:57 AM
atgo set Security to None.

@Ppena I dug through email and found an example. What filters were you/would you use to get the numbers you're getting that seem wrong?

The specific example is in the body of this task.

Ppena added a comment.Feb 7 2015, 1:00 AM

I used the filters below:

Choose Preset Time Period: (date)
Currency
method
gateway

@Ppena, so for the example above, you would've said "Last 24 hours", ILS, cc, and WorldPay?

The more specific we can be, the quicker and easier it is to find the bug. Do you always specify all fields?

Ppena added a comment.Feb 7 2015, 1:05 AM

yes yes yes- that's exactly what i would do, I would always pick all fields.

atgo renamed this task from Fraud widget numbers don't seem right to BUG: Fraud widget numbers don't seem right.Feb 11 2015, 8:35 PM
atgo edited a custom field.
Ejegg claimed this task.Feb 11 2015, 11:52 PM
Ejegg added a comment.Feb 12 2015, 7:28 PM

@CCogdill_WMF, can you compare the SQL query you're running to get the data directly to the SQL query displayed in the fraud widget? After you set the filters on the widget and see the result, you can click the icon at the top right that looks like a barrel to see the query. It's got question marks in place of the filter values, but I'm mostly curious if we're joining the same tables and counting the same things.

Just now, I tried running the query from the UI directly against the db, with ?s replaced with 'ILS', 'worldpay', 'cc', and '2015-02-11 19:10:00'. I got the same result the widget shows, so there's nothing mangling the data after we run the query.

@Ejegg, here's the query I use:

select * from payments_initial where gateway = 'worldpay' AND country = ‘IL' AND date > TIMESTAMPADD(DAY, -1, NOW())

I export the data and do a pivot table from there. Compared to the one in the dash:

SELECT AVG(CASE pf.validation_action WHEN 'reject' THEN 100 ELSE 0 END) AS fraud_percent FROM payments_fraud pf LEFT JOIN payments_initial pi ON pi.contribution_tracking_id = pf.contribution_tracking_id WHERE (pi.currency_code = ? AND pf.date > ?)

The queries are pretty different. Mine is much simpler and only pulls from payments_initial, and maybe most importantly, my query is based on country and not currency. That could be a big reason why Pats' numbers were off. I also don't know if https://phabricator.wikimedia.org/T89190 is related to this or not.

Something I want to point out is I just tried my query for Hungary and GC and the data in excel, which usually maps nicely to columns, is not tab separated. The column titles are also different than they used to be. I can do text to columns, but something has obviously changed with the data since I last worked with it on Tuesday.

@Ppena, @CCogdill_WMF: The joins explain it all. The widget data is (# of fraud filter rejections) / (# of attempts run through the fraud filter). Your data is (# of fraud filter rejections) / (all donation attempts, including a lot of things that fail before we even run our filter on them).

The widget data may be more meaningful: if an unclear form or translation gets a high number of invalid submissions, payments_initial will get a lot of entries that we reject before even running our filters, so your number will go down. Since the widget data only counts attempts that passed initial validation, I'd say it's a better measure of how many people are actually trying to use us maliciously.

If you'd rather have the widget show rejections as a percentage of all attempts, we can certainly change the code.

@Ejegg thanks for the explanation! That's not at all how my query results
were originally explained to me, so @Ppena - the data I pulled for France,
Israel, and Hungary is likely incorrect.

You're right that the fraud widget does a better job of showing us
fraudulent attempts, but it would be awesome if we could have an additional
one for rejections. Currently, I have to pull this data for Pats on a daily
basis. If she or I could just grab it from the dash, that would be ideal.
It would also be great if we could filter by Country and as well as
Currency! There are times of year where we fundraise in multiple EUR
countries at once, so currency doesn't give us all the visibility we need.

Thanks for figuring out the discrepancy.

So @Ppena does this work for you? Should we update the widget to display the same results as Caitlin's query?

@atgo I don't think we should replace the fraud widget; that is correct as
it is. It would just be nice to have a separate rejection widget :)

Le jeudi 12 février 2015, atgo <no-reply@phabricator.wikimedia.org> a
écrit :

atgo added a comment.
So @Ppena does this work for you? Should we update the widget to display
the same results as Caitlin's query?
TASK DETAIL

https://phabricator.wikimedia.org/T87810

REPLY HANDLER ACTIONS

Reply to comment or attach files, or !close, !claim, !unsubscribe or

!assign <username>.
EMAIL PREFERENCES

https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Ejegg, atgo
Cc: CCogdill_WMF, Ppena, atgo, Aklapper

I just checked in with Pats about this. We want to make sure we're understanding everything correctly, and if so, make a few requests.

As I understand it, the fraud numbers I was pulling were incorrect because they included bank rejections during the authorization phase. Does this apply to Worldpay as well as GC? Katie built my query for France originally, so I'm wondering if that's why it is off for GC countries.

The Fraud Widget on the Dash - Questions and Improvements

  • Question from Pats: the numbers seemed wildly off on the widget a month or so ago. Has this been corrected? Are we confident the widget data is correct?
  • Request 1: Please add "country" as a filter, which is much more valuable to us than "currency"
  • Request 2: Please add "date" as a filter rather than just "24 hours ago" so we can call out specific dates in the past

Rejections Widget on the Dash

  • Request: We would like to add a widget for bank and PSP rejections only, so it would not include WMF fraud rejections, with the same filter criteria as above.

Does this sound doable? Pats and I feel like we don't have a good benchmark for what our fraud numbers are or were in the past, because we're not clear what data we pulled was correct. It would be great to have these widgets implemented so we could use them to easily spot check historical data.

Hey @CCogdill_WMF - thanks for the detailed feedback. I'm going to leave tech (@Ejegg?) to answer your first question and the question about the numbers being off a month ago.

Re: widget requests - each of those should be a phab task in the Fundraising Dash project that I'll triage and prioritize. Would you mind?

Ejegg added a comment.Feb 17 2015, 6:51 PM

I see the new task requesting the rejection data and new filters in the widget, so I'll close this one.

Ejegg closed this task as Resolved.Feb 17 2015, 6:53 PM
Ejegg moved this task from Dev Ready to Done on the § Fundraising Sprint Devo board.
Ejegg reopened this task as Open.Feb 17 2015, 10:39 PM

Change 191382 had a related patch set uploaded (by Ejegg):
Fix joins between payments_fraud and _initial

https://gerrit.wikimedia.org/r/191382

Patch-For-Review

Ejegg added a comment.Feb 18 2015, 7:38 PM

Oops, looks like the widget was wrong after all. I was joining the payments_fraud table to the payments_initial table using contribution tracking ID, when I should have been using the order ID. Since the contribution tracking ID is reused when a donor retries a rejected donation, that join was double counting a lot of rejected donations and skewing the fraud numbers up. The patch above should fix that. I'll send more details in an email.

@Ejegg - is this done? I wonder if we should make a "release" column in Fundraising Dash to store the next set of changes to go out in a deploy?

Change 191382 merged by jenkins-bot:
Fix joins between payments_fraud and _initial

https://gerrit.wikimedia.org/r/191382

This should go out with the rest of the Dash deploy

Ejegg closed this task as Resolved.Mar 25 2015, 10:11 PM