Problem: We can't currently track how long it takes for a new donation to fully move through the payments system, and so we don't have a very good handle on how long banner testers have to wait for their numbers to show up, how long a donor is going to have to wait for their Thank You email, how either of these things changes over time, or what kinds of lag to expect when we start seeing high load. We can currently measure how long it takes messages to land in the db by comparing the latest recorded donation's timestamp to the actual time, but that only works during high load and isn't recorded anywhere real. I'd prefer to come up with something that will work equally well under high load as it does when the system is deeply bored.
Possibility: Establish a system by which we float special messages down the donations queue and through the system. Float a single message at regular intervals, and start that message out with the timestamp at which it was generated. At the queue consumer, record the time the message gets picked up, and the time difference. Maybe poke that time difference into prometheus.
I'd really like for that timing message to also be recorded in civi (temporarily!) such that it won't mess up the reporting, but the Thank You emailer job can try to process it as an unthanked donation, and report how long the wait is all the way down the chain.
Would have to write the thing to add messages at intervals, alter both the donations queue consumer and the Thank You mailer to treat these as test points instead of real donations, and all the prometheus/grafana stuff for actually getting the picture.
(Team note: This is the thing I have doggedly been referring to as "Duck floating". Forgive me.)