Page MenuHomePhabricator

check_recurring_gc_jobs_required > 6000
Closed, ResolvedPublic

Description

We're getting alerts that check_recurring_gc_jobs_required (one of the in-mysql queues) is over 6000. We think this is not-urgent, so we're muting the alert with this task as the reminder to look at it tomorrow...

Event Timeline

Here's the query corresponding to the metric:

SELECT MAX(COUNT) FROM (SELECT count(*) 'count' FROM civicrm.civicrm_contribution_recur WHERE trxn_id LIKE 'RECURRING GLOBALCOLLECT%' AND next_sched_contribution_date IS NOT NULL AND next_sched_contribution_date != 1 GROUP BY DATE_FORMAT(next_sched_contribution_date, '%d')) t

==tldr;==
The alerts were caused by alerts based on out-of-date thresholds. We have seen a big growth in recurrings and need to increase the thresholds.

Steps required to resolve

  1. I need help to find where the code is that sets this alert. We should perhaps double it?

Follow on actions

  1. I identified a contact that has not yet had a Jan donation recorded on his recurring ( civicrm/contact/view?reset=1&cid=2063522 ) - likely paypal has not yet processed / notified. Recheck later
  2. Consider doing some investigation into new checks / reports to make sure ones that fail are not slipping through / that we pick up ones like the above one if it does not happen. We do a funny thing in the way we record recurring in that we set them to 'Completed' rather than 'In progress' and next_sched_contribution_date is set even for old / expired/ failed out ones.

Analysis

I did some digging on this. I didn't actually find where the script is located in git so was unable to look for code comments or for git change log but I'm fairly sure the script is a rough & ready way to check nothing too weird is happening with the scheduling of recurring donations. It gathers the number of donations planned for each day of the month & if there is more than 6000 on any one day of the month (in this case the first of the month) it assumes there must be something going haywire & throws out errors. I'm not sure if there was any reason for it to assume that something is bad (Ingenico limit??) or whether it is just that was historically a very high number. However, it appears to me to be a legitimately high number based on current donation volumes

SELECT DATE_FORMAT(next_sched_contribution_date, '%d'), count(*) 'count' FROM civicrm_contribution_recur WHERE trxn_id LIKE 'RECURRING GLOBALCOLLECT%' AND next_sched_contribution_date IS NOT NULL AND next_sched_contribution_date != 1 GROUP BY DATE_FORMAT(next_sched_contribution_date, '%d') WITH ROLLUP;
+-------------------------------------------------+-------+
| DATE_FORMAT(next_sched_contribution_date, '%d') | count |
+-------------------------------------------------+-------+
| 01                                              |  6001 |
| 02                                              |  5045 |
| 03                                              |  4719 |
| 04                                              |  3944 |
| 05                                              |  2626 |
| 06                                              |  2724 |
| 07                                              |  3138 |
| 08                                              |  2837 |
| 09                                              |  2699 |
| 10                                              |  2490 |
| 11                                              |  2797 |
| 12                                              |  2336 |
| 13                                              |  2183 |
| 14                                              |  2411 |
| 15                                              |  1959 |
| 16                                              |  2407 |
| 17                                              |  2020 |
| 18                                              |  2422 |
| 19                                              |  2389 |
| 20                                              |  1656 |
| 21                                              |  1438 |
| 22                                              |  1729 |
| 23                                              |  1478 |
| 24                                              |  1617 |
| 25                                              |  1529 |
| 26                                              |  1422 |
| 27                                              |  1334 |
| 28                                              |  2252 |
| 29                                              |  2895 |
| 30                                              |  4008 |
| 31                                              |  1675 |
**| Total                                           | 80180 |**
+-------------------------------------------------+-------+
32 rows in set, 1 warning (0.44 sec)

This seems fairly close to

SELECT DATE_FORMAT(create_date, '%d'), count(*) 'count' FROM civicrm_contribution_recur WHERE trxn_id LIKE 'RECURRING GLOBALCOLLECT%' AND next_sched_contribution_date IS NOT NULL AND next_sched_contribution_date != 1 GROUP BY DATE_FORMAT(create_date, '%d') WITH ROLLUP;
+--------------------------------+-------+
| DATE_FORMAT(create_date, '%d') | count |
+--------------------------------+-------+
| 01                             |  5995 |
| 02                             |  5046 |
| 03                             |  4725 |
| 04                             |  3935 |
| 05                             |  2630 |
| 06                             |  2716 |
| 07                             |  3124 |
| 08                             |  2864 |
| 09                             |  2693 |
| 10                             |  2493 |
| 11                             |  2802 |
| 12                             |  2335 |
| 13                             |  2182 |
| 14                             |  2415 |
| 15                             |  1960 |
| 16                             |  2401 |
| 17                             |  2024 |
| 18                             |  2422 |
| 19                             |  2392 |
| 20                             |  1653 |
| 21                             |  1438 |
| 22                             |  1731 |
| 23                             |  1477 |
| 24                             |  1615 |
| 25                             |  1532 |
| 26                             |  1417 |
| 27                             |  1338 |
| 28                             |  2032 |
| 29                             |  2992 |
| 30                             |  4037 |
| 31                             |  1764 |
| total                         | 80180 |

We seem to have had a good bump in recurrings - here are the recurrings created in each quarter. Nearly 16,000 new recurrings were started in Dec 2017 & the only 6 days we have had more than 1000 new recurrings in one day were all around the beginning of Dec 2017

SELECT  count(*) 'count', create_date FROM civicrm_contribution_recur WHERE trxn_id LIKE 'RECURRING GLOBALCOLLECT%' AND next_sched_contribution_date IS NOT NULL AND next_sched_contribution_date != 1 GROUP BY YEAR(create_date), QUARTER(create_date) , MONTH(create_date), DAY(create_date) HAVING count > 1000;
+-------+---------------------+
| count | create_date         |
+-------+---------------------+
|  1303 | 2017-11-29 00:00:09 |
|  1832 | 2017-11-30 00:00:30 |
|  3054 | 2017-12-01 00:00:14 |
|  2534 | 2017-12-02 00:00:18 |
|  2176 | 2017-12-03 00:00:10 |
|  1290 | 2017-12-04 00:00:15 |
+-------+---------------------+
SELECT YEAR(create_date) as year_created, QUARTER(create_date)  as quarter, count(*) 'count' FROM civicrm_contribution_recur WHERE trxn_id LIKE 'RECURRING GLOBALCOLLECT%' AND next_sched_contribution_date IS NOT NULL AND next_sched_contribution_date != 1 GROUP BY YEAR(create_date), QUARTER(create_date);
+--------------+---------+-------+
| year_created | quarter | count |
+--------------+---------+-------+
|         2012 |       4 |   113 |
|         2013 |       1 |   385 |
|         2013 |       2 |    15 |
|         2013 |       3 |   590 |
|         2013 |       4 |  1975 |
|         2014 |       1 |   358 |
|         2014 |       2 |   330 |
|         2014 |       3 |  1030 |
|         2014 |       4 |  5493 |
|         2015 |       1 |   779 |
|         2015 |       2 |   738 |
|         2015 |       3 |  2231 |
|         2015 |       4 |  8555 |
|         2016 |       1 |  2304 |
|         2016 |       2 |  1751 |
|         2016 |       3 |  3334 |
|         2016 |       4 | 11775 |
|         2017 |       1 |  3099 |
|         2017 |       2 |  2329 |
|         2017 |       3 |  4261 |
|         2017 |       4 | 28418 |
|         2018 |       1 |   314 |

civicrm/contact/view?reset=1&cid=2063522

  1. I need help to find where the code is that sets this alert. We should perhaps double it?

It's configured in the puppet role class for the host, which is generally where we configure nagios checks:

$check_recurring_gc_jobs_required = 'check_fundraising_jobs -m recurring_gc_jobs_required -w 4000 -c 6000'

I'll adjust it to warn 8000, critical 12000.

DStrine moved this task from Triage to FR-Ops on the Fundraising-Backlog board.