Page MenuHomePhabricator

Spike: Make an estimate of the proportion of people affected by order id collision
Closed, ResolvedPublic1 Story Points

Description

Calculate total Ingenico orders per day, and total collisions per day. Publish result here.

Conclusion: We've used less than 1% of the order number space, and can survive without a new account for years unless something bad happens like a DDOS attack we're unable to stop.

Event Timeline

awight created this task.May 26 2016, 4:26 PM
Restricted Application added a subscriber: Zppix. · View Herald TranscriptMay 26 2016, 4:26 PM
zgrep -c 'Order ID collision' payments.error-201605*

payments.error-20160501.gz:7
payments.error-20160502.gz:11
payments.error-20160503.gz:6
payments.error-20160504.gz:15
payments.error-20160505.gz:16
payments.error-20160506.gz:11
payments.error-20160507.gz:12
payments.error-20160508.gz:8
payments.error-20160509.gz:13
payments.error-20160510.gz:9
payments.error-20160511.gz:5
payments.error-20160512.gz:11
payments.error-20160513.gz:17
payments.error-20160514.gz:4
payments.error-20160515.gz:6
payments.error-20160516.gz:4
payments.error-20160517.gz:4
payments.error-20160518.gz:6
payments.error-20160519.gz:6
payments.error-20160520.gz:60493
payments.error-20160521.gz:3
payments.error-20160522.gz:26
payments.error-20160523.gz:0
payments.error-20160524.gz:2
payments.error-20160525.gz:0
payments.error-20160526.gz:0

zgrep -c -E 'Preparing to send INSERT_ORDER' payments-globalcollect-201605*
payments-globalcollect-20160501.gz:2648
payments-globalcollect-20160502.gz:2419
payments-globalcollect-20160503.gz:2800
payments-globalcollect-20160504.gz:2692
payments-globalcollect-20160505.gz:3533
payments-globalcollect-20160506.gz:2731
payments-globalcollect-20160507.gz:2667
payments-globalcollect-20160508.gz:1915
payments-globalcollect-20160509.gz:1964
payments-globalcollect-20160510.gz:2296
payments-globalcollect-20160511.gz:3980
payments-globalcollect-20160512.gz:2037
payments-globalcollect-20160513.gz:2554
payments-globalcollect-20160514.gz:1789
payments-globalcollect-20160515.gz:1541
payments-globalcollect-20160516.gz:1545
payments-globalcollect-20160517.gz:1950
payments-globalcollect-20160518.gz:1652
payments-globalcollect-20160519.gz:1575
payments-globalcollect-20160520.gz:137086
payments-globalcollect-20160521.gz:1148
payments-globalcollect-20160522.gz:967
payments-globalcollect-20160523.gz:1170
payments-globalcollect-20160524.gz:1306
payments-globalcollect-20160525.gz:998
payments-globalcollect-20160526.gz:753

I've made two graphs that show the number of Order ID collisions each day in May, as a proportion of the total number of transactions initiated with Ingenico. The first graph shows that the collisions were artificially high during a fraud attack last week, due a characteristic of the attack that we won't discuss here. The second graph is the same data with the two outlier days removed, so we have more detail on the daily trends. These are almost all legitimate collisions, and we can find the Order IDs in older logs.

I'll try to fit this rate of about 0.4% into a formula to get an estimate for how many numbers are still "empty", and how quickly we might get into exponential trouble.

The easy answer is, the probability of the next randomly chosen order number colliding is (open numbers remaining) / (total numbers available), so if the current rate is about 0.4%, and the order number space is a ten-digit decimal, we have:
0.004 = remaining / 10,000,000,000
So we have roughly 40M numbers remaining. That's not a huge margin, but the rate of collisions won't even double to 0.8% risk per transaction until we use up another 20M numbers.

We're safe for this year. Copying this conclusion to the task description.

awight updated the task description. (Show Details)May 26 2016, 8:03 PM
awight changed the visibility from "Public (No Login Required)" to "acl*WMF-FR (Project)".
awight removed subscribers: Zppix, StudiesWorld.
Ejegg added a subscriber: Ejegg.May 26 2016, 8:55 PM

The conclusion that we're safe seems sound, but isn't that 40M the number of order IDs used? I'd be pretty surprised if we'd used 9.9 billion order IDs.

@Ejegg
Wow, I was completely upside-down. Yes, if the chance of a *collision* is 0.4%, then we've used up 40M out of 10B numbers, these aren't the number remaining. The number of IDs remaining is 9.96B. The chance of getting a collision will take 40M additional orders (not 20M) before it doubles to 0.8%. We're safe without a doubt (assuming these new bad maths are correct), and have a huge margin to burn.

awight updated the task description. (Show Details)May 26 2016, 9:16 PM
awight changed the visibility from "acl*WMF-FR (Project)" to "Public (No Login Required)".
awight moved this task from Backlog to Review on the Fundraising Sprint Killing Time board.

@awight What is the user experience as we approach .8%?

When you say we will survive, does that mean surviving japan or all of Big English?

I don't think people would want a slowly degrading experience through any campaign. We should settle on a minimum quality bar (i.e. the point of no return) where we MUST switch over to a new account.

@jrobell stays that Japan will probably be ~100K transactions but she is confirming. I'm not sure if this helps your estimation considering the large numbers you're already working with.

Thanks @DStrine. I have sent the Japan form to a few local testers.

awight updated the task description. (Show Details)May 31 2016, 6:00 PM
awight added a subscriber: MBeat33.May 31 2016, 6:06 PM

@DStrine
I hate to say it, but it looks like we'll survive even Big English by a wide margin of safety. The donor experience around 0.8% is simply, 0.8% of people will experience an extra delay of up to 10 seconds when making their donation. 0.8% of the affected people, or 0.0064%, might experience two delays in sequence, or up to 20 seconds.

I'm not sure how to choose a quality threshold, @MBeat33 do you have ideas about that? The easiest measurement to use is, percentage of donors who experience the additional <10 second delay, I guess we would pick a target on that scale.

Opening the iframe is not as bad a delay as one after sending the card #s. I do think we will see Zendesk tickets from donors who think the form is broken, that it's not legit, or who drop off due to the delay. If we had an average for the delay that would be helpful, as 2-3 seconds is a lot different than 10.

If the current rate is .4%, I would be in favor of anything we can do to prevent it reaching .8% during Big English. Rough math .8% * 751771 GC transactions from Dec 2015 in Civi is a big number when converted to Zendesk tickets, even if only 3% of the affected donors wrote to us. It comes out to like 900 more donor inquiries for the month, and they're tickets that require portal research so are labor intensive.

awight closed this task as Resolved.Jun 1 2016, 11:54 PM
mmodell removed a subscriber: awight.Jun 22 2017, 9:47 PM