Page MenuHomePhabricator

Review our queue speed - is it OK?
Closed, ResolvedPublic2 Estimated Story Points

Event Timeline

greg moved this task from Triage to Current Sprint on the Fundraising-Backlog board.
greg subscribed.

Time boxed spike in current sprint.

Big email send on Thursday to use as a data point.

Dwisehaupt set the point value for this task to 2.Nov 14 2022, 9:31 PM

No thank-you queue bump today during the high-volume test.

We just had a 1 hour test run - observations

The overall pattern was "around 6 donations per second, disrupted by a period of deadlocks". I feel like we were at 7 (up from 4) after I did the first round of Redis fixes & am hoping we might get that '1' back with the patch I hope to roll out during traffic tomorrow

https://frmon.frdev.wikimedia.org/d/Pq1YNMviz/fundraising-overview?orgId=1&refresh=1m&from=1668466800000&to=1668470399000&viewPanel=22

  1. The deduper should not run during periods of high traffic - but it DID start running before the test was over

https://frmon.wikimedia.org/d/lIN1akDVz/process-control-precise?orgId=1&from=1668466800000&to=1668470399000

image.png (256×1 px, 42 KB)

I'm gonna spin off a phab for that

  1. I analysed the Redis misses during the test. I rolled out one fix - but that was of limited impact (which I expected it to be) & mostly just made the output a little easier. I further identifed another fix that I think is likely to be more helpful - https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/crm/+/856703 - but the test ended before I could roll it out. It would be good to test that tomorrow. I will comment futher on T322172 but I think there is more room to move here.

Image of processing speed graph (linked above)

image.png (1×2 px, 132 KB)

I've been looking at the raw stats generated during tonight's traffic. There's a reasonably consistent negative correlation between donations processed and processing rate, i.e. the more we process, the slower the processing rate. The performance is degrading for some reason. If we can exclude other factors and confirm this is definitely happening, possibly by recording the processing of only non-recurring donations, we could try thinking of ways to prevent this performance drop over time by maybe caching connections and such.

Note: there are lots of things that could influence the processing rates. e.g. a recurring donation import is slower than a one-time. Our import stages breakouts show us this https://frmon.frdev.wikimedia.org/d/Pq1YNMviz/fundraising-overview?refresh=1m&orgId=1&viewPanel=33&from=now-3h&to=now

Here's some sample data:

donations_count{gateway="adyen"} 412
donations_count{gateway="amazon"} 11
donations_count{gateway="paypal_ec"} 329
donations_count{gateway="all"} 752
donations_processing_rate{period="batch"} 172.2097260952
donations_processing_rate{period="1s"} 4.3667684575741

donations_count{gateway="adyen"} 383
donations_count{gateway="paypal_ec"} 356
donations_count{gateway="amazon"} 16
donations_count{gateway="all"} 755
donations_processing_rate{period="batch"} 172.62205195427
donations_processing_rate{period="1s"} 4.3737169814203

donations_count{gateway="adyen"} 162
donations_count{gateway="paypal_ec"} 155
donations_count{gateway="amazon"} 9
donations_count{gateway="all"} 326
donations_processing_rate{period="batch"} 48.713936090469
donations_processing_rate{period="1s"} 6.6921301410456

donations_count{gateway="paypal_ec"} 300
donations_count{gateway="adyen"} 372
donations_count{gateway="amazon"} 8
donations_count{gateway="ingenico"} 2
donations_count{gateway="all"} 682
donations_processing_rate{period="batch"} 157.74275708199
donations_processing_rate{period="1s"} 4.3234948635108

donations_count{gateway="adyen"} 278
donations_count{gateway="paypal_ec"} 186
donations_count{gateway="amazon"} 9
donations_count{gateway="ingenico"} 1
donations_count{gateway="astropay"} 1
donations_count{gateway="all"} 475
donations_processing_rate{period="batch"} 82.511066913605
donations_processing_rate{period="1s"} 5.7568035145802

donations_count{gateway="adyen"} 137
donations_count{gateway="paypal_ec"} 111
donations_count{gateway="amazon"} 4
donations_count{gateway="all"} 252
donations_processing_rate{period="batch"} 34.203144073486
donations_processing_rate{period="1s"} 7.3677437214126

There might be something in the mix that is slow & longer ones have more chance of hitting it?

XenoRyet set Final Story Points to 4.