Page MenuHomePhabricator

Slow query on deduping 2 contacts
Closed, ResolvedPublic4 Estimated Story Points

Description

Digging into the merge hook I found that loading the merge screen with just 2 pre-selected contacts was slow.

It turns out the slow query (15-17 seconds) was new in 4.5 & it is

SELECT

table_name,
column_name

FROM information_schema.key_column_usage
WHERE

referenced_table_schema = database() AND
referenced_table_name = 'civicrm_contact' AND
referenced_column_name = 'id';

It was introduced in https://issues.civicrm.org/jira/browse/CRM-14500 in order to make the selection of which tables to merge dynamic.

I did some digging into it and found that

  1. the query will theoretically run faster with this mysql setting innodb_stats_on_metadata = 0; per https://www.percona.com/blog/2011/12/23/solving-information_schema-slowness/

I don't have permission to try that on staging but tested on other servers & locally - I got a semi-consistent 10% speed saving on this query so it would be worth trying on staging but it's not the main bottleneck.

  1. I did further testing on my local and on other servers I had access to & found the most significant factor in the speed is the total number of tables the mysql user has permission to see. Currently on staging the mysql user has permission to see 2 flavours of dev db - ie. probably double the number of tables the production user would have permission to - so depending how exponential the effect is this may not turn out to be an issue on production.

The query does not only access tables in the current database even if that is provided as a parameter to the query unless both schema & table are defined in the query (which rather defeats the purpose)

I will log this on civicrm.org too - for visibility - but we need to confirm how the query performs on prod before prioritising it.

In most situations the values derived from the query would only change on upgrade, on adding a custom table and on on module installation so it would make sense to cache this information in some way & only update on cache-flush

Event Timeline

Eileenmcnaughton raised the priority of this task from to Needs Triage.
Eileenmcnaughton updated the task description. (Show Details)
Eileenmcnaughton subscribed.
Eileenmcnaughton triaged this task as High priority.
Eileenmcnaughton set Security to None.

Change 255065 had a related patch set uploaded (by Eileen):
CRM-17454 fix slow dedupe query

https://gerrit.wikimedia.org/r/255065

Change 255065 merged by jenkins-bot:
CRM-17454 fix slow dedupe query

https://gerrit.wikimedia.org/r/255065

Change 285792 had a related patch set uploaded (by Eileen):
CRM-17454 fix slow dedupe query

https://gerrit.wikimedia.org/r/285792

Change 285792 merged by Eileen:
CRM-17454 fix slow dedupe query

https://gerrit.wikimedia.org/r/285792