Page MenuHomePhabricator

ERROR Bus error
Closed, ResolvedPublic

Description

The error started happening when we added an 'if' clause to the greeting template and disappeared when we removed it again.

It appeared to be maybe a memory leak?

It's worth investigating because it might point to performance issues that affect us more broadly with smarty parsing (eg. in Thank you mails).

Here is the reason why the 'if' punches above it's weight

Note the reason the single-use string turns off smarty caching is the strings aren't cachable - ie you wind up with {if 1}Dear Greg{else}Dear donor} as the string

public static function parseOneOffStringThroughSmarty($templateString) {
  if (!CRM_Utils_String::stringContainsTokens($templateString)) {
    // Skip expensive smarty processing.
    return $templateString;
  }

ie - no if & smarty doesn't load. We initially testing adding the if & didn't see a big performance hit - but today's issue suggests more of a memory leak - which builds up rather than an instant hit

Note the caching is turned off in that function because single-use strings aren't cachable - ie at that point the string looks like

`{if 1}Dear Elliot{else}Dear donor{/if}

Event Timeline

I feel like when we logged this we felt like memory or disk might somehow being exhausted. We saw this as an intermittent error but I haven't been able to find where it was in the logs on our logging server. From memory

  • the server was under load when it occured
  • it occurred during donations & recurring jobs
  • it started when we altered the greeting template & ended when we stopped

We had a theory something about the change might have exhausted the memory or disk so I looked to see if it leaked memory.

I tried running concurrent qperf-d on staging but didn't trigger it - although perhaps more processes for longer would have?

Things to determine

  • does it happen with the same or different contacts
  • how does disk & memory look at the time
  • what processes are running when it happens

Digging around I have found that when creating 500 contacts memory usage increases by ~400kb without the IF and ~600kb - this doesn't feel significant.

The Bus Error could in some cases be a segmentation fault it seems. A fairly good internet explanation

'bus error' probably means that the program that is being invoked is trying to dereference a null pointer or some similarly invalid memory address. It usually comes from using an uninitialised value (dereferencing a null pointer), or from using a value that has been accidentally overwritten (e.g. when the stack is pushed with saved values, but lengths are miscalculated or the wrong data type used to extract the data).

I found some internet references to Bus Errors in php-ish
https://stackoverflow.com/questions/3789089/bus-error-in-cron-job
https://moodle.org/mod/forum/discuss.php?d=378981
https://bugs.php.net/bug.php?id=47596
https://bugs.php.net/bug.php?id=80435
https://github.com/stwa/google_addressbook/issues/40
https://github.com/kubernetes/kubernetes/issues/71233
https://trac.macports.org/ticket/58598?cversion=0&cnum_hist=9
https://segmentfault.com/a/1190000040175584/en

Takeaways

A few themes

The closest theory seems to come from this link
https://segmentfault.com/a/1190000040175584/en

  • ie one process is re-creating a file while another process is accessing it. Smarty disk-caches it's file so me current working theory is that one process is recreating that file which another is accessing it.

Just noting that I found the smarty file for the greeting - oddly it was last modified in the future if I read that right...

Current server time
Thu 15 Dec 2022 08:59:19 PM UTC

Size: 1056      	Blocks: 8          IO Block: 4096   regular file

Device: 903h/2307d Inode: 17826154 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 33/www-data) Gid: ( 33/www-data)
Access: 2022-12-15 18:58:00.510227574 +0000
Modify: 2022-12-15 18:58:00.510227574 +0000
Change: 2022-12-15 18:58:00.510227574 +0000

I feel like we should determine if this easy fix helps before digging deeper as it seems to be 'around the same aroun dof operations' and was on our long list anyway

https://phabricator.wikimedia.org/T227039

It would look like

  1. Adding the following to our civicrm.settings.php for prod (maybe not staging? since we test on there), definitely not dev environments
/**
 * SMARTY Compile Check:
 *
 * This tells Smarty whether to check for recompiling or not. Recompiling
 * does not need to happen unless a template or config file is changed.
 * Typically you enable this during development, and disable for production.
 *
 * Related issue:
 * https://lab.civicrm.org/dev/core/issues/1073
 *
 */
//if (!defined('CIVICRM_TEMPLATE_COMPILE_CHECK')) {
//  define('CIVICRM_TEMPLATE_COMPILE_CHECK', FALSE);
//}
  1. re-enabling the smarty at a time we can monitor (with a plan to disable if the issue recurs - we might also be able to gather more data at that point

We have deployed ^^ & changed back to the conditional email greeting - if we need to revert back then it needs to be

Dear {contact.first_name}

at https://civicrm.wikimedia.org/civicrm/admin/options/email_greeting?reset=1

XenoRyet set Final Story Points to 4.