Page MenuHomePhabricator

Make dedupe reversible step 1 of 5 test turning on logging on Staging to see how it performs.
Closed, ResolvedPublic4 Estimated Story Points

Description

  1. We need to test turning on logging on Staging to see how it performs.
    1. I need to provide a mysql file to Jeff to create the triggers & log tables
    2. Jeff to run sql, determining how much extra disc is used
    3. general click around by me

Event Timeline

DStrine raised the priority of this task from to Needs Triage.
DStrine updated the task description. (Show Details)
DStrine subscribed.
DStrine renamed this task from Make dedupe reversible step 1 of 3 test turning on logging on Staging to see how it performs. to Make dedupe reversible step 1 of 5 test turning on logging on Staging to see how it performs..Feb 16 2016, 11:48 PM
DStrine set Security to None.

@Jgreen I have uploaded the sql that would turn logging on on staging to my sandbox branch. This is not where I would expect that sql would stay but I expect to tweak it over time so I figured getting it into version control was as good a way as any to share the file.

http://git.wikimedia.org/tree/wikimedia%2Ffundraising%2Fcrm%2Fcivicrm.git/refs%2Fheads%2Fsandbox%2Feileen%2Fstaging

When you are available I'd like to try running that on staging. I am aware it will create a lot of extra data and that we need to understand how manageable that is. I don't know whether it will complete without problems or not but the section I ran locally was OK.

Could you check disk before & after running it?

Change 273179 had a related patch set uploaded (by Eileen):
Add rudimentary performance timing drush command

https://gerrit.wikimedia.org/r/273179

My expectation is that the default table type (archive) won't be acceptable. However, I'd like to turn it on with archive (using this sql http://git.wikimedia.org/tree/wikimedia%2Ffundraising%2Fcrm%2Fcivicrm.git/refs%2Fheads%2Fsandbox%2Feileen%2Fstaging )

& see

  1. how much disk space is taken
  2. how long the performance script (https://gerrit.wikimedia.org/r/273179) takes to run

(note - I could record disk by just doing df before & after and getting the difference or there might be a better way?)

Having done that I would try converting the tables (selectively) to INNODB & adding indexes on id, contact_id/entity_id, connection_id, user_id (possibly) and action_date (possibly also action) and do the same checks.

Other people have looked at TokuDB & INNODB compressed row format : https://issues.civicrm.org/jira/browse/CRM-18104

Ok I loaded up the trigger.mysql query batch, seems to have run smoothly. After that the new log_civicrm database consumes 4.3GB on disk.

Hi @Jgreen I can't see the triggers! I checked & the trigger data didn't seem to be there so I did SHOW TRIGGERS & didn't see anything in dev_civicrm

Yeah, you're right. Looking at trigger.mysql the create trigger syntax is
bad, which I guess caused it to silently fail on load. See:

https://dev.mysql.com/doc/refman/5.5/en/trigger-syntax.html

jg

OK the delimiters aren't in the sql - it should be more like

delimiter //

CREATE TRIGGER civicrm_contribution_after_insert after insert ON civicrm_contribution
FOR EACH ROW
BEGIN
IF ( @civicrm_disable_logging IS NULL OR @civicrm_disable_logging = 0 )
THEN
INSERT INTO log_civicrm_contribution (id, contact_id, financial_type_id, contribution_page_id, payment_instrument_id, receive_date, non_deductible_amount, total_amount, fee_amount, net_amount, trxn_id, invoice_id, currency, cancel_date, cancel_reason, receipt_date, thankyou_date, source, amount_level, contribution_recur_id, is_test, is_pay_later, contribution_status_id, note, address_id, check_number, campaign_id, tax_amount, creditnote_id, log_conn_id, log_user_id, log_action)
VALUES ( NEW.id, NEW.contact_id, NEW.financial_type_id, NEW.contribution_page_id, NEW.payment_instrument_id, NEW.receive_date, NEW.non_deductible_amount, NEW.total_amount, NEW.fee_amount, NEW.net_amount, NEW.trxn_id, NEW.invoice_id, NEW.currency, NEW.cancel_date, NEW.cancel_reason, NEW.receipt_date, NEW.thankyou_date, NEW.source, NEW.amount_level, NEW.contribution_recur_id, NEW.is_test, NEW.is_pay_later, NEW.contribution_status_id, NEW.note, NEW.address_id, NEW.check_number, NEW.campaign_id, NEW.tax_amount, NEW.creditnote_id,
CONNECTION_ID(), @civicrm_user_id, 'insert'
);
END IF;
END;//

delimiter ;

DStrine set the point value for this task to 4.Mar 2 2016, 11:52 PM

Progress status- this is on on staging & seems OK but my reason for not putting to done is that I believe that the wmf_ tables were skipped (didn't begin with civicrm_ )

Change 280157 had a related patch set uploaded (by Eileen):
CRM-18212 allow triggers to be managed offline.

https://gerrit.wikimedia.org/r/280157

I've brought this into review as there are some patches that need review - ie https://gerrit.wikimedia.org/r/#/c/280156/2

I'm trying to juggle the order a bit as there are some in the drupal repo & civi repos

However, I still have some things to do on it too

Change 280336 had a related patch set uploaded (by Eileen):
Enable setting to specify logging is handled offline

https://gerrit.wikimedia.org/r/280336

I'm moving this to done and setting it to resolved

Queue performance are turning on logging was not substantially worse -

Before tests:
2016-02-24 22:02:10 Starting donation queue performance profile for 20 donations.
Context is : Staging:
2016-02-24 22:02:14 Donation queue performance profile took 4 second(s) for 20 donations.
Context is : Staging:
Average performance is 300 per minute
2016-02-24 22:02:29 Starting donation queue performance profile for 200 donations.
Context is : Staging:
2016-02-24 22:03:05 Donation queue performance profile took 36 second(s) for 200 donations.
Context is : Staging:
Average performance is 333 per minute
2016-02-24 22:03:28 Starting donation queue performance profile for 2000 donations.
Context is : Staging:
2016-02-24 22:09:46 Donation queue performance profile took 378 second(s) for 2000 donations.
Context is : Staging:
Average performance is 317 per minute

After tests:

Context is : Staging:INNODB
2016-03-16 20:24:29 Donation queue performance profile took 397 second(s) for 2000 donations.
Context is : Staging:INNODB
Average performance is 302 per minute.
2016-03-16 20:26:45 Starting donation queue performance profile for 2000 donations.
Context is : Staging:INNODB
2016-03-16 20:33:15 Donation queue performance profile took 390 second(s) for 2000 donations.
Context is : Staging:INNODB
Average performance is 308 per minute.

Change 280336 merged by jenkins-bot:
Enable setting to specify logging is handled offline

https://gerrit.wikimedia.org/r/280336

I did some more tests using this on staging on 4.7.31 - I got times between 345 per second & 411 per second over 1000 records- with the main difference being that the first run is a bit slower - presumably as mysql caching helps on later runs

Recording results after mysql is 'warmed up'

With this patch https://github.com/civicrm/civicrm-core/pull/11615

WD wmf_civicrm: Successfully updated contribution_tracking for 52086106 [300.56 sec, 102.68 MB] [info]
Processed 2000 messages in 298 second(s)
Average performance is 403 per minute.

Command dispatch complete [300.57 sec, 95.11 MB] [notice]
Timer Cum (sec) Count Avg (msec)
page 300.523 1 300522.9

Without the patch
WD wmf_civicrm: Successfully updated contribution_tracking for 52088106 [297.6 sec, 118 MB] [info]
Processed 2000 messages in 295 second(s)
Average performance is 407 per minute.

Command dispatch complete [297.61 sec, 110.42 MB] [notice]
Timer Cum (sec) Count Avg (msec)
page 297.56 1 297560.32

The speed difference is well within the normal variation between runs but the memory usage difference was consistent & replicable.

Summary - no performance degradation with this & less memory use