The Fraud widget in the dash is displaying abnormally high numbers. January 27 looking at the Dash gave 11.2%, but looking at fredge data 5.1%.
|wikimedia/fundraising/dash : master||Fix joins between payments_fraud and _initial|
sorry, i dont think i have them. Everytime i looked and confirmed that it was wrong with either you or Caitlin, I just recorded the correct % on my tracking sheets. A good idea will be for the Tech team to query something at the same time in the widget and run a sql to see how different that it.
@CCogdill_WMF, can you compare the SQL query you're running to get the data directly to the SQL query displayed in the fraud widget? After you set the filters on the widget and see the result, you can click the icon at the top right that looks like a barrel to see the query. It's got question marks in place of the filter values, but I'm mostly curious if we're joining the same tables and counting the same things.
Just now, I tried running the query from the UI directly against the db, with ?s replaced with 'ILS', 'worldpay', 'cc', and '2015-02-11 19:10:00'. I got the same result the widget shows, so there's nothing mangling the data after we run the query.
@Ejegg, here's the query I use:
select * from payments_initial where gateway = 'worldpay' AND country = ‘IL' AND date > TIMESTAMPADD(DAY, -1, NOW())
I export the data and do a pivot table from there. Compared to the one in the dash:
SELECT AVG(CASE pf.validation_action WHEN 'reject' THEN 100 ELSE 0 END) AS fraud_percent FROM payments_fraud pf LEFT JOIN payments_initial pi ON pi.contribution_tracking_id = pf.contribution_tracking_id WHERE (pi.currency_code = ? AND pf.date > ?)
The queries are pretty different. Mine is much simpler and only pulls from payments_initial, and maybe most importantly, my query is based on country and not currency. That could be a big reason why Pats' numbers were off. I also don't know if https://phabricator.wikimedia.org/T89190 is related to this or not.
Something I want to point out is I just tried my query for Hungary and GC and the data in excel, which usually maps nicely to columns, is not tab separated. The column titles are also different than they used to be. I can do text to columns, but something has obviously changed with the data since I last worked with it on Tuesday.
@Ppena, @CCogdill_WMF: The joins explain it all. The widget data is (# of fraud filter rejections) / (# of attempts run through the fraud filter). Your data is (# of fraud filter rejections) / (all donation attempts, including a lot of things that fail before we even run our filter on them).
The widget data may be more meaningful: if an unclear form or translation gets a high number of invalid submissions, payments_initial will get a lot of entries that we reject before even running our filters, so your number will go down. Since the widget data only counts attempts that passed initial validation, I'd say it's a better measure of how many people are actually trying to use us maliciously.
If you'd rather have the widget show rejections as a percentage of all attempts, we can certainly change the code.
You're right that the fraud widget does a better job of showing us
fraudulent attempts, but it would be awesome if we could have an additional
one for rejections. Currently, I have to pull this data for Pats on a daily
basis. If she or I could just grab it from the dash, that would be ideal.
It would also be great if we could filter by Country and as well as
Currency! There are times of year where we fundraise in multiple EUR
countries at once, so currency doesn't give us all the visibility we need.
Thanks for figuring out the discrepancy.
@atgo I don't think we should replace the fraud widget; that is correct as
it is. It would just be nice to have a separate rejection widget :)
Le jeudi 12 février 2015, atgo <firstname.lastname@example.org> a
atgo added a comment.
So @Ppena does this work for you? Should we update the widget to display
the same results as Caitlin's query?
REPLY HANDLER ACTIONSReply to comment or attach files, or !close, !claim, !unsubscribe or
To: Ejegg, atgo
Cc: CCogdill_WMF, Ppena, atgo, Aklapper
I just checked in with Pats about this. We want to make sure we're understanding everything correctly, and if so, make a few requests.
As I understand it, the fraud numbers I was pulling were incorrect because they included bank rejections during the authorization phase. Does this apply to Worldpay as well as GC? Katie built my query for France originally, so I'm wondering if that's why it is off for GC countries.
The Fraud Widget on the Dash - Questions and Improvements
- Question from Pats: the numbers seemed wildly off on the widget a month or so ago. Has this been corrected? Are we confident the widget data is correct?
- Request 1: Please add "country" as a filter, which is much more valuable to us than "currency"
- Request 2: Please add "date" as a filter rather than just "24 hours ago" so we can call out specific dates in the past
Rejections Widget on the Dash
- Request: We would like to add a widget for bank and PSP rejections only, so it would not include WMF fraud rejections, with the same filter criteria as above.
Does this sound doable? Pats and I feel like we don't have a good benchmark for what our fraud numbers are or were in the past, because we're not clear what data we pulled was correct. It would be great to have these widgets implemented so we could use them to easily spot check historical data.
Re: widget requests - each of those should be a phab task in the Fundraising Dash project that I'll triage and prioritize. Would you mind?
Oops, looks like the widget was wrong after all. I was joining the payments_fraud table to the payments_initial table using contribution tracking ID, when I should have been using the order ID. Since the contribution tracking ID is reused when a donor retries a rejected donation, that join was double counting a lot of rejected donations and skewing the fraud numbers up. The patch above should fix that. I'll send more details in an email.