Page MenuHomePhabricator

Start regularly tracking "lag" time of donations moving through the payments system
Open, NormalPublic2 Story Points

Description

Problem: We can't currently track how long it takes for a new donation to fully move through the payments system, and so we don't have a very good handle on how long banner testers have to wait for their numbers to show up, how long a donor is going to have to wait for their Thank You email, how either of these things changes over time, or what kinds of lag to expect when we start seeing high load. We can currently measure how long it takes messages to land in the db by comparing the latest recorded donation's timestamp to the actual time, but that only works during high load and isn't recorded anywhere real. I'd prefer to come up with something that will work equally well under high load as it does when the system is deeply bored.

Possibility: Establish a system by which we float special messages down the donations queue and through the system. Float a single message at regular intervals, and start that message out with the timestamp at which it was generated. At the queue consumer, record the time the message gets picked up, and the time difference. Maybe poke that time difference into prometheus.

I'd really like for that timing message to also be recorded in civi (temporarily!) such that it won't mess up the reporting, but the Thank You emailer job can try to process it as an unthanked donation, and report how long the wait is all the way down the chain.

Would have to write the thing to add messages at intervals, alter both the donations queue consumer and the Thank You mailer to treat these as test points instead of real donations, and all the prometheus/grafana stuff for actually getting the picture.

(Team note: This is the thing I have doggedly been referring to as "Duck floating". Forgive me.)

Event Timeline

K4-713 created this task.Sep 27 2017, 11:10 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 27 2017, 11:10 PM

Change 381263 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[wikimedia/fundraising/crm@master] Report average consumed message age

https://gerrit.wikimedia.org/r/381263

Change 381264 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[wikimedia/fundraising/crm@master] Report average thank you mail delay

https://gerrit.wikimedia.org/r/381264

Ejegg claimed this task.Sep 28 2017, 5:32 PM
Ejegg triaged this task as Normal priority.
Ejegg set the point value for this task to 2.
Ejegg moved this task from Backlog to Review on the Fundraising Sprint Synchronized Screaming board.

Change 381263 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Report average consumed message age

https://gerrit.wikimedia.org/r/381263

Change 381264 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Report average thank you mail delay

https://gerrit.wikimedia.org/r/381264

Not going to add the ducks this sprint, so taking the tag off.

Change 382858 had a related patch set uploaded (by Ejegg; owner: Ejegg):
[wikimedia/fundraising/crm@master] WIP only report age of messages from payments

https://gerrit.wikimedia.org/r/382858

Oops, I forgot to move this on the backlog board after finishing the part that we can do pre-BE

Change 382858 abandoned by Ejegg:
Only report age of messages from payments

https://gerrit.wikimedia.org/r/382858