Page MenuHomePhabricator

Recurring queue consumer fails with funny stats execption
Closed, ResolvedPublic

Description

starting 2020-08-17, we've gotten several failmails per day regarding the recurring queue consumer crashing.

Error: Unsupported operand types in Statistics\Collector\AbstractCollector->endTimer()

(line 347 of vendor/jackgleeson/stats-collector/src/Collector/AbstractCollector.php)

line 347 subtracts the start timer from the end timer to get the diff:

'diff' => $end - $this->getStat($namespace)['start'],

but is wrapped in a check that should ensure the start is set:

if ($this->hasStartTimer($namespace)) {

We haven't been able to replicate it locally, but it seems to happen after processing a subscr_signup message from paypal. The contribution_recur row does not get written to the database, so the payments for these subscriptions will probably end up in the damaged queue.

recent changes that involve the Stats collection include

Looking at recurring_queue_consume-20210818-162001.log, I see a successfully imported subscr_start message with USD currency that created a new contact. The following subscr_start message (which crashed) was in CAD and had a name and email that matched an existing contact.

Event Timeline

Change 713686 had a related patch set uploaded (by Jgleeson; author: Jgleeson):

[wikimedia/fundraising/crm@master] Update stats-collector to 1.4.11 to better handle startTimer bug.

https://gerrit.wikimedia.org/r/713686

Change 713696 had a related patch set uploaded (by Jgleeson; author: Jgleeson):

[wikimedia/fundraising/crm@deployment] Update stats-collector to 1.4.11 to better handle startTimer bug.

https://gerrit.wikimedia.org/r/713696

Seems like it might be happening specifically when the recurring queue consumer updates an existing contact (rather than creating a new one)

Change 713686 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Update stats-collector to 1.4.11 to better handle startTimer bug.

https://gerrit.wikimedia.org/r/713686

Change 713696 merged by Jgleeson:

[wikimedia/fundraising/crm@deployment] Update stats-collector to 1.4.11 to better handle startTimer bug.

https://gerrit.wikimedia.org/r/713696

OK, so the start value is ending up as an array, which is a hard thing to subtract from a float.

$new = {array} [3]
   start = {array} [2]
      0 = {float} 1629324106.431
      1 = {float} 1629324147.1451
   end = {float} 1629324108.28
   diff = {float} 1.8489289283752

The key for reproducing the issue locally is to have multiple messages in the same QC run which update an existing contact. Pick a contact in your Civi DB with a unique [email,fname,lname], note the first & last name and email.

The easy way (thanks @Cstone): on your local payments-wiki sign up for two new Ingenico recurring donations using the same email, fname, and lname and run the ctqc

The hard way: Copy those values into this json file: https://phabricator.wikimedia.org/diffusion/WFCG/browse/master/drupal/sites/all/modules/queue2civicrm/tests/data/recurring_signup.json
Then

  • delete the newlines (PHPStorm's UI lets you replace \n if you turn the regex option on, or use the trusty old tr -d '\n'),
  • create two copies of the file with different values for subscr_id (duplicate subscr_ids will be skipped before the contact is updated), and
  • import the files into the recurring queue with
php /srv/SmashPig/Maintenance/PopulateQueueFromDump.php recurring <filePath.json>

Then run the recurring queue consumer

The issue seems to be with how the ImportStatsCollector uses its uniqueStatsToken. I'm not exactly clear on why that stats collector needs a unique token when for example the DonationStatsCollector doesn't need one. Further, the generated token has a '.' in it, which causes the ImportStatsCollector to treat it like two levels of a dot-separated namespace when other values for other stats collectors have no dot and are flattened out.

I'll see if I can figure out a quick fix based on the latest changes to Save.php, then we can dive into a better solution.

OK, so the reason this fails for multiple recurring signups and not for multiple payments that update a donor is that the wmf_civicrm_contribution_message_import resets the unique token on every call:
https://phabricator.wikimedia.org/diffusion/WFCG/browse/master/drupal/sites/all/modules/wmf_civicrm/wmf_civicrm.module$196

We can add a call to reset it to the start of each message process in the recurring queue consumer too, but I worry that this same issue will affect e.g. imports.

Change 713714 had a related patch set uploaded (by Ejegg; author: Ejegg):

[wikimedia/fundraising/crm@master] Reset unique stats token on each message

https://gerrit.wikimedia.org/r/713714

Change 713714 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Reset unique stats token on each message

https://gerrit.wikimedia.org/r/713714

Change 713730 had a related patch set uploaded (by Ejegg; author: Ejegg):

[wikimedia/fundraising/crm@deployment] Reset unique stats token on each message

https://gerrit.wikimedia.org/r/713730

Change 713730 merged by Ejegg:

[wikimedia/fundraising/crm@deployment] Reset unique stats token on each message

https://gerrit.wikimedia.org/r/713730

OK, that fix is deployed and the recurring queue size is coming down. We should probably do a quick grep for the subscr_ids in the crashy messages, check how many of them are missing from the contribution_recur table, and reconstitute and requeue those subscr_signup messages so the associated payments can be imported.

Change 713889 had a related patch set uploaded (by Ejegg; author: Ejegg):

[wikimedia/fundraising/crm@master] Eliminate need to init() import timer

https://gerrit.wikimedia.org/r/713889

Change 713941 had a related patch set uploaded (by Jgleeson; author: Jgleeson):

[wikimedia/fundraising/crm@master] Add unit tests for civicrm import stats patch to confirm stats output.

https://gerrit.wikimedia.org/r/713941

Change 713889 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Eliminate need to init() import timer

https://gerrit.wikimedia.org/r/713889

Change 713941 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Add unit tests for civicrm import stats patch to confirm stats output.

https://gerrit.wikimedia.org/r/713941

Change 714390 had a related patch set uploaded (by Jgleeson; author: Jgleeson):

[wikimedia/fundraising/crm@master] Remove unused code from ImportMessage stats unit tests.

https://gerrit.wikimedia.org/r/714390

Change 714390 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Remove unused code from ImportMessage stats unit tests.

https://gerrit.wikimedia.org/r/714390