Page MenuHomePhabricator

Delete more deleted contacts?
Closed, ResolvedPublic

Description

We got rid of pre 2017 - here are the remaining ones

Now we have done the technical side we should do the policy side.

I would say 18 months would be the conservative end of how long to keep contact for & 6 months is probably the minimum we should consider.

We might go with say 6 months before delete & after a year clean them out of log tables.

NULL2300
20172841330
20183305146
20193598669
20201100416
NULL10847861
NULL34512654

Event Timeline

Removing task assignee due to inactivity as this open task has been assigned for more than two years. See the email sent to the task assignee on August 22nd, 2022.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome!
If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… πŸ‘’ Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

Change 949190 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Fix date handling in Query.Parse action

https://gerrit.wikimedia.org/r/949190

I got to the bottom of where the slowness is coming from (Redis but for code reasons not server reasons) - https://lab.civicrm.org/dev/core/-/issues/4501

Change 949190 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Fix date handling in Query.Parse action

https://gerrit.wikimedia.org/r/949190

Change 949604 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Trial shorter Redis cache

https://gerrit.wikimedia.org/r/949604

Change 949604 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Trial shorter Redis cache

https://gerrit.wikimedia.org/r/949604

I just deployed a patch that reduced the Redis cache time from 6 hours to 1 hour

Per the core issue what is happening is that when a contact is being deleted it is enumerating the redis keys for the prenext cache & looking for & deleting that contact from within them - effectively removing the contact from search results

I discovered we have a LOT of keys 34k as of right now (down from 47k when the server was busier) and looking through them more than 60% are prevnext keys

It will take a few hours for the Redis cache to start to expire down to a smaller set but recording current key count & I'll note down a coupld of timings too - triggering delete-deleted-contacts now

redis-cli
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> dbsize
(integer) 34756

Note this is the commit message from the Redis patch mentioned above (copying here for visibility)

Trial shorter Redis cache

This reduces the TTL on our Civi Redis cache from 6 hours to one hour.

At this stage the goal is to just test the impact.

The Redis cache is used for storing metadata and also search results from
UI searches. This latter is referred to as the prevnext cache and is
the more significant usage. Unlike the metadata it will not 'just
reload' as it reflects a specific user's actions through the UI.

The prevnext cache basically stores all the search results from a given
UI search, allowing paging & actions to be taken on it. It seems
fairly safe to assume that a user would NOT be working on the same
search results for longer than an hour. The only scenario Tim thought
of was when someone steps away from their browser for an hour
https://lab.civicrm.org/dev/core/-/issues/4501#note_150102

When I checked recently there were 45k keys in the prevnext cache which
seems like a lot. I can't see this being a full fix for the cache issue
in the gitlab but I would like to deploy it & see

  1. the impact it has on the number of keys in the cache
  2. the impact it has on the speed of deleting contacts
  3. whether users notice - either positively (speed) or

negatively (weird stuff)

I just ran delete_deleted_contacts with the cache change deployed but not aged into meaningfulness

delete 500
start 2023-08-17 02:03:32
end 2023-08-17 02:14:11
Took around 11 mins
DBSize 34830

Recent dedupe runs ( the ones prior did not run as they detected heavy load & early returned)

2023-08-17 00:50:01 - 2023-08-17 01:07:12 (17 mins) Merged 642. Skipped 575
2023-08-17 01:25:01 - 2023-08-17 02:10:50 (50 mins) Merged 671. Skipped 891

Will see what it looks like when that size reduces a little

Ok more info on current redis state (- no meaningful key reduction yet) - output from

redis-cli --bigkeys
  • summary -------

Sampled 34465 keys in the keyspace!
Total key length in bytes is 3846355 (avg len 111.60)

Biggest hash found '"crm/prevnext/civicrm search CRMContactControllerSearchww21y4i2wq8ocgwcc8s8wwcgssg8wk0kscckkkwwsc0ow8ckg_9852/data"' has 500 fields
Biggest string found '"crm/metadata_5_64_beta1/CRM_Core_BAO_CustomFieldgetAllCustomFields1_en_US"' has 310329 bytes
Biggest zset found '"crm/prevnext/civicrm search CRMContactControllerSearch5nmx3y0fkv0gkg00okskcwkg84k0csc0480coos0ko8kg84oso_5905/all"' has 500 members

0 lists with 0 items (00.00% of keys, avg size 0.00)
15540 hashs with 241106 fields (45.09% of keys, avg size 15.52)
1384 strings with 8865806 bytes (04.02% of keys, avg size 6405.93)
0 streams with 0 entries (00.00% of keys, avg size 0.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
17541 zsets with 243186 members (50.90% of keys, avg size 13.86)

Some ways of getting the prevnext cache count

redis-cli keys crm/prevnext*  | wc -l
33081
redis-cli --scan --pattern 'crm/prevnext*' |wc -l
33081

hmm - the expires is not being set

TTL 'crm/prevnext/civicrm search CRMContactControllerSearch3qg9w7j1ikowwscow0cs4kc48wkoco8g4co4ck04ooko8sswos_2925/all'
(integer) -1
127.0.0.1:6379> expire 'crm/prevnext/civicrm search CRMContactControllerSearch3qg9w7j1ikowwscow0cs4kc48wkoco8g4co4ck04ooko8sswos_2925/all' 10
(integer) 1
127.0.0.1:6379> TTL 'crm/prevnext/civicrm search CRMContactControllerSearch3qg9w7j1ikowwscow0cs4kc48wkoco8g4co4ck04ooko8sswos_2925/all'
(integer) 7

Summary of situation (emailed to fr-tech & copied here)

Hi all,

Just an update on where I'm at with Redis + deleting contacts

Symptom
Deleting deleted contacts script takes a loonnng time on prod & a very short time on staging. We are talking 11 minutes vs 15 seconds to permanently delete 500 soft-deleted contacts!

Why
Because when deleting a contact the code iterates through the Redis keys and does 3 delete actions per key AND because the TTL is not being set correctly on the Redis keys we have 35k+ keys on prod Redis.

What else is affected
As far as I know this would affect deduping and merging deleted contacts - add a bit over 1 second to each one but it will be interesting to see if we can see any other impacts from cleaning it up.

Metrics
I currently have 2 metrics
how long does run-job -j delete_deleted_contacts take to run - currently around 11 minutes. That is about 10.5 minutes longer than it takes on staging
how fast do our dedupe jobs run - this is way less consistent so I don't know if we will be able to determine the impact but we did a run that took 17 mins & merged 642 contacts - if we can knock 650 seconds off that then that is 10 minutes!
But - I'm also curious as to whether we can spot the impact of cleaning this up anywhere else - so keen for anyone else to suggest and or monitor the impact of these changes

The plan

  1. deploy this patch - that will bring the number of deletes down from 30k to 10k so I'm expecting to see an improvement on our metrics with that
  2. once we've had a chance to check out the impact of that we can flush all the existing keys from Redis & see how things rattle around with no build up of cruft
  3. I'm still working on a patch to prevent the keys building up again by setting the TTL correctly - but I'd like to do that after we do some impact analysis on the above
  4. once the ttl is being set correctly do another Redis flush just to get to a nice baseline

Refs
Upstream gitlab https://lab.civicrm.org/dev/core/-/issues/4501#note_150102
Phab https://phabricator.wikimedia.org/T247347

Note that I have cc'd Tim on this for his interest (or disinterest) as I have been discussing it with him

Eileen

OK - with the first patch deployed we are closer to 3-4 minutes for delete_deleted_contacts

The volume of dedupes at the moment is not enough to conclude anything

Code to delete prevnext entries

cv php:eval "\Civi::service('prevnext')->deleteItem();"

Ref https://github.com/civicrm/civicrm-core/pull/27113

OK - I cleared the cache - 35k keys gone & it now takes around 12-15 seconds - ie 60 times faster

Change 951197 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/crm@master] Update redis caching to set expire

https://gerrit.wikimedia.org/r/951197

OK - it's going like the clappers now with all those cache keys cleared out.

Still to do

  1. fix it so it sets ttl in future https://gerrit.wikimedia.org/r/951197
  2. merge ^^ upstream
  3. start the job again (the patch is committed on frpm - just need to get it OKd)
  4. do one more cache-clear once 1 is done as there will be a small handful of keys that have no ttl to flush out

Dedupe updated timing 7 mins - which is pretty much the 10mins faster I hoped for

Merged 626. Skipped 796

Change 951197 merged by jenkins-bot:

[wikimedia/fundraising/crm@master] Update redis caching to set expire

https://gerrit.wikimedia.org/r/951197

Well the job is running again - it will take a long time to catch up at 500 contacts every 16 minutes - we could increase the frequency / batch size of the job or we could just figure - it'll get there. Perhaps @Dwisehaupt
we want to hustle it along a bit to allow you to reduce the table db size in maintenance window? Otherwise it can just chug along

select YEAR(modified_date) as year, COUNT(*) FROM civicrm_contact WHERE is_deleted = 1 GROUP BY YEAR(modified_date);
"crm/prevnext/civicrm search CRMContactControllerSearch3zqxtv0qvmyo0gg0cw4kk40084k4sgkwk0okog4cskcwk0ocgk_1981/sel"

+------+----------+

yearCOUNT(*)

+------+----------+

NULL2211
20161
201710
201810
20192094346
20203916240
2021634115
2022523733
2023230865

+------+----------+
9 rows in set (19.112 sec)

XenoRyet set Final Story Points to 4.