Page MenuHomePhabricator

Delete more deleted contacts?
Closed, ResolvedPublic


We got rid of pre 2017 - here are the remaining ones

Now we have done the technical side we should do the policy side.

I would say 18 months would be the conservative end of how long to keep contact for & 6 months is probably the minimum we should consider.

We might go with say 6 months before delete & after a year clean them out of log tables.


Event Timeline

Aklapper removed subscribers: DStrine, CCogdill_WMF.

Removing task assignee due to inactivity as this open task has been assigned for more than two years. See the email sent to the task assignee on August 22nd, 2022.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome!
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see for tips how to best manage your individual work in Phabricator. Thanks!

Change 949190 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Fix date handling in Query.Parse action

I got to the bottom of where the slowness is coming from (Redis but for code reasons not server reasons) -

Change 949190 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Fix date handling in Query.Parse action

Change 949604 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Trial shorter Redis cache

Change 949604 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Trial shorter Redis cache

I just deployed a patch that reduced the Redis cache time from 6 hours to 1 hour

Per the core issue what is happening is that when a contact is being deleted it is enumerating the redis keys for the prenext cache & looking for & deleting that contact from within them - effectively removing the contact from search results

I discovered we have a LOT of keys 34k as of right now (down from 47k when the server was busier) and looking through them more than 60% are prevnext keys

It will take a few hours for the Redis cache to start to expire down to a smaller set but recording current key count & I'll note down a coupld of timings too - triggering delete-deleted-contacts now

Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.> dbsize
(integer) 34756

Note this is the commit message from the Redis patch mentioned above (copying here for visibility)

Trial shorter Redis cache

This reduces the TTL on our Civi Redis cache from 6 hours to one hour.

At this stage the goal is to just test the impact.

The Redis cache is used for storing metadata and also search results from
UI searches. This latter is referred to as the prevnext cache and is
the more significant usage. Unlike the metadata it will not 'just
reload' as it reflects a specific user's actions through the UI.

The prevnext cache basically stores all the search results from a given
UI search, allowing paging & actions to be taken on it. It seems
fairly safe to assume that a user would NOT be working on the same
search results for longer than an hour. The only scenario Tim thought
of was when someone steps away from their browser for an hour

When I checked recently there were 45k keys in the prevnext cache which
seems like a lot. I can't see this being a full fix for the cache issue
in the gitlab but I would like to deploy it & see

  1. the impact it has on the number of keys in the cache
  2. the impact it has on the speed of deleting contacts
  3. whether users notice - either positively (speed) or

negatively (weird stuff)

I just ran delete_deleted_contacts with the cache change deployed but not aged into meaningfulness

delete 500
start 2023-08-17 02:03:32
end 2023-08-17 02:14:11
Took around 11 mins
DBSize 34830

Recent dedupe runs ( the ones prior did not run as they detected heavy load & early returned)

2023-08-17 00:50:01 - 2023-08-17 01:07:12 (17 mins) Merged 642. Skipped 575
2023-08-17 01:25:01 - 2023-08-17 02:10:50 (50 mins) Merged 671. Skipped 891

Will see what it looks like when that size reduces a little

Ok more info on current redis state (- no meaningful key reduction yet) - output from

redis-cli --bigkeys
  • summary -------

Sampled 34465 keys in the keyspace!
Total key length in bytes is 3846355 (avg len 111.60)

Biggest hash found '"crm/prevnext/civicrm search CRMContactControllerSearchww21y4i2wq8ocgwcc8s8wwcgssg8wk0kscckkkwwsc0ow8ckg_9852/data"' has 500 fields
Biggest string found '"crm/metadata_5_64_beta1/CRM_Core_BAO_CustomFieldgetAllCustomFields1_en_US"' has 310329 bytes
Biggest zset found '"crm/prevnext/civicrm search CRMContactControllerSearch5nmx3y0fkv0gkg00okskcwkg84k0csc0480coos0ko8kg84oso_5905/all"' has 500 members

0 lists with 0 items (00.00% of keys, avg size 0.00)
15540 hashs with 241106 fields (45.09% of keys, avg size 15.52)
1384 strings with 8865806 bytes (04.02% of keys, avg size 6405.93)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
17541 zsets with 243186 members (50.90% of keys, avg size 13.86)

Some ways of getting the prevnext cache count

redis-cli keys crm/prevnext*  | wc -l
redis-cli --scan --pattern 'crm/prevnext*' |wc -l

hmm - the expires is not being set

TTL 'crm/prevnext/civicrm search CRMContactControllerSearch3qg9w7j1ikowwscow0cs4kc48wkoco8g4co4ck04ooko8sswos_2925/all'
(integer) -1> expire 'crm/prevnext/civicrm search CRMContactControllerSearch3qg9w7j1ikowwscow0cs4kc48wkoco8g4co4ck04ooko8sswos_2925/all' 10
(integer) 1> TTL 'crm/prevnext/civicrm search CRMContactControllerSearch3qg9w7j1ikowwscow0cs4kc48wkoco8g4co4ck04ooko8sswos_2925/all'
(integer) 7

Summary of situation (emailed to fr-tech & copied here)

Hi all,

Just an update on where I'm at with Redis + deleting contacts

Deleting deleted contacts script takes a loonnng time on prod & a very short time on staging. We are talking 11 minutes vs 15 seconds to permanently delete 500 soft-deleted contacts!

Because when deleting a contact the code iterates through the Redis keys and does 3 delete actions per key AND because the TTL is not being set correctly on the Redis keys we have 35k+ keys on prod Redis.

What else is affected
As far as I know this would affect deduping and merging deleted contacts - add a bit over 1 second to each one but it will be interesting to see if we can see any other impacts from cleaning it up.

I currently have 2 metrics
how long does run-job -j delete_deleted_contacts take to run - currently around 11 minutes. That is about 10.5 minutes longer than it takes on staging
how fast do our dedupe jobs run - this is way less consistent so I don't know if we will be able to determine the impact but we did a run that took 17 mins & merged 642 contacts - if we can knock 650 seconds off that then that is 10 minutes!
But - I'm also curious as to whether we can spot the impact of cleaning this up anywhere else - so keen for anyone else to suggest and or monitor the impact of these changes

The plan

  1. deploy this patch - that will bring the number of deletes down from 30k to 10k so I'm expecting to see an improvement on our metrics with that
  2. once we've had a chance to check out the impact of that we can flush all the existing keys from Redis & see how things rattle around with no build up of cruft
  3. I'm still working on a patch to prevent the keys building up again by setting the TTL correctly - but I'd like to do that after we do some impact analysis on the above
  4. once the ttl is being set correctly do another Redis flush just to get to a nice baseline

Upstream gitlab

Note that I have cc'd Tim on this for his interest (or disinterest) as I have been discussing it with him


OK - with the first patch deployed we are closer to 3-4 minutes for delete_deleted_contacts

The volume of dedupes at the moment is not enough to conclude anything

Code to delete prevnext entries

cv php:eval "\Civi::service('prevnext')->deleteItem();"


OK - I cleared the cache - 35k keys gone & it now takes around 12-15 seconds - ie 60 times faster

Change 951197 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Update redis caching to set expire

OK - it's going like the clappers now with all those cache keys cleared out.

Still to do

  1. fix it so it sets ttl in future
  2. merge ^^ upstream
  3. start the job again (the patch is committed on frpm - just need to get it OKd)
  4. do one more cache-clear once 1 is done as there will be a small handful of keys that have no ttl to flush out

Dedupe updated timing 7 mins - which is pretty much the 10mins faster I hoped for

Merged 626. Skipped 796

Change 951197 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Update redis caching to set expire

Well the job is running again - it will take a long time to catch up at 500 contacts every 16 minutes - we could increase the frequency / batch size of the job or we could just figure - it'll get there. Perhaps @Dwisehaupt
we want to hustle it along a bit to allow you to reduce the table db size in maintenance window? Otherwise it can just chug along

select YEAR(modified_date) as year, COUNT(*) FROM civicrm_contact WHERE is_deleted = 1 GROUP BY YEAR(modified_date);
"crm/prevnext/civicrm search CRMContactControllerSearch3zqxtv0qvmyo0gg0cw4kk40084k4sgkwk0okog4cskcwk0ocgk_1981/sel"





9 rows in set (19.112 sec)

XenoRyet set Final Story Points to 4.