Significant & Increasing delay on thank you emails sent via process control, starting @ ~ Mon May 27 13:35:19 UTC 2024.
Write-up of incident progression here (WIP)
Significant & Increasing delay on thank you emails sent via process control, starting @ ~ Mon May 27 13:35:19 UTC 2024.
Write-up of incident progression here (WIP)
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Avoid extra calls to Smarty::getTemplateDir | wikimedia/fundraising/crm | master | +29 -18 |
Change #1036779 had a related patch set uploaded (by Ejegg; author: Ejegg):
[wikimedia/fundraising/crm@master] Avoid extra calls to getTemplateDir
Change #1036779 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Avoid extra calls to Smarty::getTemplateDir
I think all of us dug into this in some way today but I feel we got to the bottom of it. In the end I found 3 fixes & Elliott found 1 and I think any of them are OK for a short-term fix (we have merged Elliott's & can deploy it tomorrow).
The status is
More details
I'm treating the gitlab issue https://lab.civicrm.org/dev/core/-/issues/5243 as the primary source for technical details so I'm going to write up some technical detail on how this relates to WMF but leave the main technical discussion for that issue.
This issue specifically affects the code which switches between languages. Hence we might not see much slow down after an English language campaign but would see more if it was a mix of languages.
When the preferred_language changes CiviCRM fires change of locale which triggers, amount other things, the configuration hook to be called - this hook is where the extensions register their Smarty template directories.
In CiviCRM 5.74 which we deployed last week there was a change to the way templates register their directories that started calling a function combination that undermines Smarty's tracking of directories - causing it to constantly reload them. There are a couple of approaches to fixing this under discussion on gitlab / chat
However, it also turns out that the whole premise of calling the calling the configuration hook at this point is questionable, especially for a site that does not implement different languages on the drupal side. I got the best performance increased by by-passing that code entirely and I want to follow up whether we can do that.
Logging
We all tried different things - that was fun. The profiling didn't seem to work well because it seems to get the slow down you have to run it long enough that the profile output is too big to load
The silver lining
I think this might be a blessing in disguise - the end result *might* be that our thank yous run significantly faster - locally I run 5 times faster after fixing the 'forever bug' compared to just fixing this issue
I did some work on using different logging to output function timings. I think it is promising but I'm currently thinking it might be something for Jack & I to look into in Hamburg
Ah, just saw @Eileenmcnaughton already submitted a few upstream patches. I've closed the duplicate one.
Send delay is back to normal and holding:
https://frmon.wikimedia.org/d/R5m3iU1Wk/queue?orgId=1&from=1716788935000&to=now&viewPanel=21
Moving to the Done column for review during End of Sprint. Thank you to everyone who jumped on this!