Page MenuHomePhabricator

Search for name and org name in a 'the-agnositc way' in Civi
Closed, ResolvedPublic

Description

Search is absolutely killing our CiviCRM users. Searching for "Justice League" will not find an organization named "The Justice League".

Event Timeline

awight raised the priority of this task from to Needs Triage.
awight updated the task description. (Show Details)
awight set Security to None.

@DStrine we discussed this in Civi-fortnightly - potentially we could have an extension that saves organisation sort_names like 'Sloan Foundation, The' to make it searchable - this might want a priority bump

Eileenmcnaughton renamed this task from Fulltext search for name and org name in Civi to Search for name and org name in a 'the-agnositc way' in Civi.Mar 20 2019, 4:33 AM
Eileenmcnaughton updated the task description. (Show Details)

I've picked this up but have updated it to a narrower scope - ie. handling searching better for contacts leading with The (which was the example not issue before). Adam won't mind - he left fr-tech ;-)

Out of a recent fortnightly the idea of simply altering what is saved in the sort_name field to remove the The came up. The idea is that we save

'The Justice League' AND 'Justice League' with a sort_name of 'Justice League' - quicksearch uses the sort_name so that should be fairly intuitive. It will affect the default sort order of the Benefactor report - which falls back on sort_name - not sure if that is good, bad or neither.

In digging today I found that core doesn't actually let you set sort_name - but I think that is fixable - https://github.com/civicrm/civicrm-core/pull/13863

I also realised that if we standardise sortnames we can do a dedupe rule on them https://github.com/civicrm/civicrm-core/pull/13864 - there is a separate Phab on that that I'll hunt out

Both upstream PRs have test fails though :-(

I also think it might wind up being we roll out these changes with the next civi update rather than pulling in patches

Also we might want to do some culling in sort name of known suffixes for the dedupe one

@LeanneS @NNichols digging into this I see that if we strip 'The ' from the sort name it gives us an opportunity to also dedupe them - which makes me wonder if there are some other common strings to strip - maybe I should make them configurable rather than hard coded... hmm

@Eileenmcnaughton Thanks! I love that idea of removing 'The' from the sort name and using that to help dedupe. I have a feeling that has caused a fair amount of unseen dupes.

Extension now at https://github.com/eileenmcnaughton/org.wikimedia.thethe

I made it so we can configure other strings to strip out if we want

@DStrine FYI - this is not truly in 'doing' for me - I have patches merged upstream but I was thinking to wait for them to be deployed in the next civi update rather than put them through internal review to save work on wmf team

Change 500851 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Add extension to cleanup sort name for orgs

https://gerrit.wikimedia.org/r/500851

Email sent out

Hi all,

We are looking at rolling out a change next week that will change searching for organizations that start with 'The'.

CiviCRM has 2 name fields that are a bit invisible

  • display_name
  • sort_name

For individuals display_name looks like 'Ms Eileen L McNaughton the Third whereas sort_name looks like 'McNaughton, Eileen'. For organizations they both look like 'The Wikimedia Foundation'.

When you enter a string into the quick search you are searching the 'sort name' field.

The change we are looking a pushing out changes it so that for organizations (only) an organization called 'The Wikimedia Foundation' will be saved as

organization name - The Wikimedia Foundation
display_name - The Wikimedia Foundation
sort_name - Wikimedia Foundation

This is intended to help with long-standing issues regarding searching with & without 'the the' as well as make it possible to dedupe organisations that differ only in 'the the'.

However, there is a potential for it to be confusing - and if that turns out to be the case we can back out & revert the sort names to match the organization names.

Also note that the majority of our 'The' contacts are not organizations and won't be affected. There are only 883 contacts that will be affected & 10k that are individuals - some perhaps should be organizations but the majority seem to be things like 'the czar of rhythm, George'

If this affects you please subscribe to the phab - https://phabricator.wikimedia.org/T115536
Eileen

Change 500851 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Add extension to cleanup sort name for orgs

https://gerrit.wikimedia.org/r/500851

This is now deployed - if you want to search for one of our public benefactors 'The Montgomery Family Foundation' you should now just search for 'Montgomery Family Foundation'

The sort order in the benefactors report is also affected - this appears to better align it with the desired sorting

@LeanneS @NNichols - if You'd like to clean up dupes involving 'The' - they are now caught using this rule.

civicrm/contact/dedupefind?reset=1&action=update&rgid=16&limit=5000000

Note that you can safely use the buttons - 'Batch merge selected duplicates' and 'Batch merge all duplicates' before doing a manual run through

Screen Shot 2019-04-26 at 12.41.25 PM.png (65×432 px, 11 KB)

These 2 buttons do the same thing the automated script does and sometime there will be some that are now mergeable but may not have been tried by the script or which had conflicts resolved after the script ran - there is another button that appears with 'force merge' that will override conflicts so when you see that you need to be careful

Eileenmcnaughton closed subtask Restricted Task as Resolved.Jan 20 2022, 1:56 AM