Page MenuHomePhabricator

Checking for code bottlenecks to improve speed of imports
Closed, ResolvedPublic

Description

Related to T376068 - I think it's worth checking if there is anything in the code we can alter to help push these imports through faster as we get higher volume files. We want to balance this with avoiding any DB deadlock issues as well.

@Elbar53 is going to try monitor speeds when these imports run in the background. Thanks!

Event Timeline

@Eileenmcnaughton Hi Eileen, Ellen is reporting slowness still on the imports. She will track the length of time it takes to import, but it doesn't seem to be operating at the usual pace. @Elbar53 Can more be done to improve this? @dkozlowski @akanji for awarness.

AKanji-WMF moved this task from Next to Triage on the Fundraising-Backlog board.

@dkozlowski @Ejegg @Eileenmcnaughton Here is an update from @Elbar53 on the import timing:

I just finished the Engage imports. They all imported quickly except for one file. It's the individual file. The Endowment individual file went well, but the Engage individual took around 4 hours. There were 198 donations and took way longer than expected (8x longer) to import than expected.

Thanks @EMartin - that is really useful analysis - we must have a slow query somewhere in our handling - presumably around the soft credits

This particular file has no soft credits.

Can you share the file with me @Elbar53 - I'll do some testing with it

@Elbar53 Can you help Eileen out with the file requested above?

I also wanted to note that I am importing only 4 gifts in a file and it has been about an hour and a half and they are still queued.

@MDemosWMF so there is some control on the processing of the donations when the server is also processing queues - to reduce deadlocks on both sides the imports don't get processed when high volume is coming in from the queues. I can see that is what is happening right now because I see in the log

Early return as queue is backed up. 1264 contributions in the last 5 minutes is greater than the threshold of 500
  • it might be worth making that more visible to you in the first instance

@Eileenmcnaughton That would help so we know if it is an error or if it is just on hold due to large volume at that time

@Eileenmcnaughton Can you check if there was high volume in the queues on Friday? @Elbar53 mentioned she was doing an individual EFT import and it took quite a while with only ~25 lines of data.

@MDemosWMF I think we had the banners on full for 24 hours at the end of last week - so I certainly hope it was high volume!

@AKanji-WMF @dkozlowski. Could we treat as urgent? This is having high impact on the data entry team.

So this is about managing server load - in general we expect lags in getting donations in over Big English and high volume tests. I guess it would be good to understand how much impact on getting our banner donations into Civi is OK to give more server resource to these

I have confirmed with the team that we previously didn't experience such lags in prior years during 6ENC to the point where things take this long. This is definitely a new phenomenon with the new imports. Is there a difference btwn how our current import flow handles contention vs the prior flow?

Yes, there is a difference - the imports are set to lower priority compared to the incoming banner/ email traffic. In previous years both the imports (esp the larger ones) regularly hit deadlocks or caused deadlocks during high traffic - so someone doing a UI import could impact the data coming in from the banners

(However, we did used to tell people not to do imports when big english was really ramped up)

Well, my team works 8-5 and we certainly never had to plan for low volume events to get the work done. If this is how it has to be it will mean a delay in revenue recognition as we certainly don't want to jeopardize banner activity.

This is on the agenda for us to discuss tomorrow with @MDemosWMF at civifortnightly - potentially re-assessing our current throttling guidelines.

There is an import running now that is interfering with the queues and with user experience of Civi - it may be that we need to tighten up the throttle again but I think the dedupe rule might be tweakable - I see queries like this - which may be something we can improve upon

2440100civicrm10.64.40.115:48042civicrmQuery35Creating sort index/* User : 199 *//* User : 199 */INSERT INTO civicrm_tmp_e_dedupe_d3b00f7b977280d5478739fbf7b70b8f (id1, weight) SELECT t1.id id1, 50 weight FROM civicrm_contact t1 WHERE t1.contact_type = 'Individual' AND t1.first_name = 'David' GROUP BY id1, weight ON DUPLICATE KEY UPDATE weight = weight + VALUES(weight)

Aha - the dedupe rule in use is

Email_OR_first_last_street_ref_Melanie_

  • that rule is actually gonna be slow cos of first_name being there without last name - this is the Engage individual import - I'll discuss that dedupe rule further with @MDemosWMF

Hang on - there IS last name in there - new theory - something went wrong with the dedupe patch we are carrying

OK - now I recall the scope of the dedupe patch was probably too narrow to catch this - it will take a bit of work to broaden it

Change #1126204 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Stock CiviCRM 6.1rc

https://gerrit.wikimedia.org/r/1126204

Change #1126204 merged by Eileen:

[wikimedia/fundraising/crm@master] Stock CiviCRM 6.1rc

https://gerrit.wikimedia.org/r/1126204

Change #1128579 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Do not enable legacydedupefinder

https://gerrit.wikimedia.org/r/1128579

Change #1128579 merged by Eileen:

[wikimedia/fundraising/crm@master] Do not enable legacydedupefinder

https://gerrit.wikimedia.org/r/1128579