Civi unresponsive for Engage in evenings
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	• LeanneS
	Dec 9 2016, 5:12 PM

Description

Hi - Engage informed me that Civi has been unresponsive in the evenings their time and they aren't able to save contributions. Please see their note below and let me know if you need any additional info.

"We have run into an issue a few times this week where our operators are working in Civi and around 5:30 PM EST (it happened last night [12/8] at 5:45PM) the system becomes unresponsive when attempting to save a transactions (the circle just spins and spins). Our operators then close their browser and are able to sign in again, however they receive the same issue and are unable to save."

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		Jgreen	T152786 Civi unresponsive for Engage in evenings
		Resolved		Eileenmcnaughton	T152936 CRM-19752 Fix impact of financial acl code on query speed

Event Timeline

• LeanneS created this task.Dec 9 2016, 5:12 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2016, 5:12 PM

@LeanneS is checking if this was just one day or if this occurs at the same time every day.

Update from Engage that it also happened on Monday afternoon:

"This happened on Monday (12/5) around 4:20 PM.

We worked in the system until 6:00 PM on 12/6 and 9:40 PM on 12/7 without any issues."

I took a look at the scheduled jobs and could see nothing that would cause a regular slow query at this time. I also trawled through the slow queries over the last 24 hours. I found one that can be improved easily and caused 12 instances of slowness to Megan or Michael over the period (CRM-19752). I also found that the query to determine whether to run dedupes is taking longer than it needs to as there is no index on civicrm_contact.created_date. I can see how to improve that.

The latter item will not have caused any slowness for Engage, the former probably will not have but it's possible.

I'm not sure where to find slow query logs older than 24 hours - @Jgreen ?

https://docs.google.com/spreadsheets/d/1PQBSB_cx6umWoCRTXJrjVE-BaPaTK3MNHF5dk7DOjFQ/edit#gid=0

I'll take another look at the log tomorrow to see what has gone on since now

grep 'Query_time: [2-9][0-9]' fundraisingdb-mysql-slow
grep 'Query_time: [1-9][0-9][0-9]' fundraisingdb-mysql-slow

Eileenmcnaughton created subtask T152936: CRM-19752 Fix impact of financial acl code on query speed.Dec 12 2016, 12:40 AM

I'm not sure where to find slow query logs older than 24 hours - @Jgreen ?

All the ./remote/* logs rotate into the archive, at indium:/a/archive/indium/logs

• DStrine moved this task from Triage to Sprint +1 on the Fundraising-Backlog board.Dec 12 2016, 8:59 PM

Eileenmcnaughton mentioned this in T153127: Slow query: dedupe script.Dec 13 2016, 9:24 PM

Eileenmcnaughton closed subtask T152936: CRM-19752 Fix impact of financial acl code on query speed as Resolved.Dec 13 2016, 11:08 PM

I'm pretty certain the issue on 8 Dec was caused by the silverpop export being run at an odd time - see the highlighted row below. These rows all refer to the timing of a mysql query in the silverpop export 'INSERT INTO silverpop_export_highest...' which happens about 1.5 hours in

query time	query_duration	query_start_time_est	estimated silverpop start in est	estimated silverpop end in est
2016-11-23 09:13:20	798.674	2016-11-23 04:13:20	2016-11-23 02:43:20	2016-11-23 07:43:20
2016-11-24 09:14:24	812.421	2016-11-24 04:14:24	2016-11-24 02:44:24	2016-11-24 07:44:24
2016-11-26 09:16:45	809.747	2016-11-26 04:16:45	2016-11-26 02:46:45	2016-11-26 07:46:45
2016-11-29 09:13:22	803.035	2016-11-29 04:13:22	2016-11-29 02:43:22	2016-11-29 07:43:22
2016-11-30 09:18:56	847.097	2016-11-30 04:18:56	2016-11-30 02:48:56	2016-11-30 07:48:56
2016-12-01 09:16:46	828.324	2016-12-01 04:16:46	2016-12-01 02:46:46	2016-12-01 07:46:46
2016-12-02 09:17:10	853.241	2016-12-02 04:17:10	2016-12-02 02:47:10	2016-12-02 07:47:10
2016-12-03 09:20:00	879.442	2016-12-03 04:20:00	2016-12-03 02:50:00	2016-12-03 07:50:00
2016-12-05 09:20:58	864.753	2016-12-05 04:20:58	2016-12-05 02:50:58	2016-12-05 07:50:58
2016-12-06 07:24:24	854.977	2016-12-06 02:24:24	2016-12-06 00:54:24	2016-12-06 05:54:24
2016-12-07 07:28:54	911.569	2016-12-07 02:28:54	2016-12-07 00:58:54	2016-12-07 05:58:54
2016-12-07 11:42:15	918.381	2016-12-07 06:42:15	2016-12-07 05:12:15	2016-12-07 10:12:15
2016-12-08 19:23:22	930.391	2016-12-08 14:23:22	2016-12-08 12:53:22	2016-12-08 17:53:22
2016-12-09 07:26:28	915.414	2016-12-09 02:26:28	2016-12-09 00:56:28	2016-12-09 05:56:28
2016-12-10 07:32:41	931.187	2016-12-10 02:32:41	2016-12-10 01:02:41	2016-12-10 06:02:41
2016-12-11 07:32:20	917.388	2016-12-11 02:32:20	2016-12-11 01:02:20	2016-12-11 06:02:20
2016-12-12 07:31:07	918.097	2016-12-12 02:31:07	2016-12-12 01:01:07	2016-12-12 06:01:07

• DStrine triaged this task as High priority.Dec 15 2016, 10:17 PM

• DStrine moved this task from Sprint +1 to Triage on the Fundraising-Backlog board.

Please see the latest update from Engage today:

I am writing regarding a few errors we are encountering when processing in Civi. The movie and image attached are happening at random times during processing. The movie was recorded on Tuesday night (12/13) around 7:00 PM EST, it shows the screen in a constant loop unable to update transactions, in this situation we are not able enter any information. If we close the browser, reopen it, log back into Civi and attempt to update a record we received the same result.

The included image shows an error we are receiving when entering Benevity batches into Civi without a batch number. The error happened on and off yesterday, this picture was taken at 5:40 PM EST. The problem we are having is when the screen loads to enter a transaction in random cases there is not an area to enter a batch number, our operators are then forced to refresh the page until the batch number area appears (this can be anywhere from one refresh to twenty). We are then able to enter without a problem for a random amount of time before it appears again. If we try to enter the information when the batch number area is not visible we receive the attached error regarding batch number and reaching the server.

One additional item I would like to point out is that both computers used to access Civi experience these problems; however, they may not be at the same time. Meaning one operator may be enter without an issues while the other is experiencing the batch number/server issue. When the updating issue occurs it has effected both computers at the same time (though do to the time of day that it has happened only one operator is entering).

Could somebody text me the next time this happens? My cell is on the
collab wiki fundraising contacts page. Looking at database activity while
it's happening is often easier than trying to make sense of the past event
from logs, so I will try to take a look in realtime.

@Jgreen Definitely. I'll request that they notify me as soon as they experience any issues and then text you.

@LeanneS
FYI: In backlog grooming, we reviewed this ticket. We are going to leave it in the Analysis column for now. However, it happens again, we will pull it into the current sprint.

note slow query digging ticket is https://phabricator.wikimedia.org/T153134

Ejegg moved this task from Triage to Old Unscheduled (prior to 2019) on the Fundraising-Backlog board.Jan 3 2017, 10:19 PM

• DStrine lowered the priority of this task from High to Low.Jan 26 2017, 12:34 AM

Just checking back on this since there hasn't been an update, is it still a problem?

@Jgreen Thanks for checking. I haven't heard of any issues recently.