Page MenuHomePhabricator

Q4 FY2019/20 investigate export and upload issues with the silverpop export
Closed, ResolvedPublic

Description

Lets think through what we want to change to address the following:

  • We have some new calculations that need to be added to the export: T252245
  • We are also running into upload file sizes and we can't compress the file. **

Approach**

We should switch from a full table refresh to incremental refreshes, using the civicrm_contact.modified_date and the log_date fields on the log tables to find changed contacts.

The goal would be the table would be always in a state where an update could be taken from it. This table is likely to be used for other purposes -searching, Erin's queries so may hold fields beyond just the export

Details

SubjectRepoBranchLines +/-
wikimedia/fundraising/toolsmaster+25 -14
wikimedia/fundraising/toolsmaster+10 -12
wikimedia/fundraising/toolsmaster+4 -0
wikimedia/fundraising/toolsmaster+17 -16
wikimedia/fundraising/toolsmaster+7 -0
wikimedia/fundraising/toolsmaster+70 -26
wikimedia/fundraising/toolsmaster+0 -10
wikimedia/fundraising/toolsmaster+73 -43
wikimedia/fundraising/toolsdeploy+1 -11
wikimedia/fundraising/toolsmaster+1 -11
wikimedia/fundraising/toolsmaster+11 -7
wikimedia/fundraising/toolsmaster+172 -39
wikimedia/fundraising/toolsmaster+57 -0
wikimedia/fundraising/toolsmaster+21 -21
wikimedia/fundraising/toolsmaster+1 -0
wikimedia/fundraising/crmmaster+4 -0
wikimedia/fundraising/toolsmaster+7 -3
wikimedia/fundraising/toolsmaster+1 -5
wikimedia/fundraising/toolsmaster+1 -1
wikimedia/fundraising/toolsmaster+7 -7
wikimedia/fundraising/toolsmaster+7 -2
wikimedia/fundraising/toolsmaster+11 -5
Show related patches Customize query in gerrit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

MG email has been sent ~15 mins ago. Will start remapping at the top of the hour, 4pm UTC.

Hi @Eileenmcnaughton I'm mapping now and I'm not seeing the new fields in the file - nothing with an "all_funds". I also notice the two fields we want deleted are in the file as well (latest_usd_amount and timezone).

@CCogdill_WMF mentioned there could be more than one file in the queue - so I'll run this one and see if there's another file to map to afterward.

Hi all - the manual run for DatabaseUpdate-20200630055542.csv ran successfully, but it looks like Acoustic automatically began running the July 1st file without me initiating it. I just cancelled that job.

@Eileenmcnaughton (or maybe @Ejegg ?) Are the new fields present in the July 1 file? Would it be possible to push that file over again so I can remap with the new field additions since they weren't in the June 30th file?

Let me know if this can be done today. Otherwise I'll deactivate the recurring import again and try again tomorrow morning. Thanks!

Updating here and will also update in the relevant task T252245 that the field changes in the civi file did not 'take' and @Eileenmcnaughton is working on a solution now. I'll try mapping again my Thursday morning, around 1-2pm UTC.

Change 608515 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Silverpop refactor - further reduce updating of silverpop table

https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/ /608515

@KHaggard I have actually run the full upload 3 times today - which I think will see 3 files. All 3 should map to the same mapping but they have different ratios of placeholders to data. If the most recent one successfully uploads you should have data in all the new fields. I ran through all 3 fully as I am scared one or more will fail (the code differs between them) & figured it would be good to try all 3. That might take a bit of wrangling at your end. The last one has only just started

Ok, thanks @Eileenmcnaughton . I will do what I can to manually map all 3 but if the mapping is the same across all files I think Acoustic will do the same thing as yesterday and run all 3 at the same time after I map the first file in the queue. Will attempt now and let you know how it goes!

Alright, I mapped the first file in the queue and it ran successfully.

While going over the list I realized there was no TargetSmart gender field (I was looking for a "prospect_gender" based on how you labeled the rest of the TargetSmart fields) in the civi file. I see that you did send over a "gender" field, so I want to confirm: did you rename what used to be "z_gender" to "gender"? If so, I remapped that one wrong and will go in to fix that mapping in the next queued file.

For context, Acoustic has 2 gender fields, "gender" and "TS_gender". The first one isn't used very much and the second one is TargetSmart specific. Knowing that past civi files never sent over gender, I'm pretty sure that what I'm seeing is the TargetSmart gender field, but just checking here to be sure. If anyone in fr-tech can confirm this, that would be great!

Other than that, everything else looks good so far. Thanks!

Acoustic is now telling me the FTP is empty again but I only mapped and ran the file once. Sorry to ask, but could the file be re-uploaded again today? @Eileenmcnaughton

@KHaggard I've kicked off the most recent upload again - I'm not regenerating the files so it should not take more than 30 mins..
CiviCRM has only one gender field (which does seem to be gender rather than prospect_gender in what I deployed). The information inn this field mostly comes from targetsmart but at least in theory the odd one might be manually updated.

Change 609238 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Silverpop - build main table incrementally

https://gerrit.wikimedia.org/r/609238

Remapping went successfully and when I ran the job it completed successfully. I'll run some queries with new fields tomorrow and check the health of the recurring import in the morning. Will flag if anything unusual comes up. Thanks for your help on this project @Eileenmcnaughton !

Change 609557 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Fix column names & table name on prospect fields for silverpop

https://gerrit.wikimedia.org/r/609557

I'm looking at the error logs because we got another timeout notification for build_files. Thoughts:

  • I see a bunch of "unknown table" errors for the silverpop_export tables after it runs the INSERT INTO silverpop_endow, which from context is the INSERT INTO silverpop_endowment_highest query added here: https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/608501/5/silverpop_export/update_table.sql#198.
  • This is likely the timeout issue as the temporary views disappear.
  • It looks like we're trying to look at whether amounts match to determine the right contribution?
  • I'm not quite sure why we need the MAX statements if we're already finding and matching the highest contribution amount. It looks like in the code comments on staging it only took 58 seconds.
  • Maybe there was an outage on silverpop's side? It seems odd that it would suddenly take so long..

To address the final bullet point, I've never seen a timeout issue with the silverpop recurring import process but I will bring this up with our Trilogy rep. I think with outages, we would have received an email or notification of some kind, but I will check on that also.

Change 609880 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Prevent non-donors being exported to silverpop

https://gerrit.wikimedia.org/r/609880

Change 609880 abandoned by Eileen:
[wikimedia/fundraising/tools@master] Prevent non-donors being exported to silverpop

Reason:
logically if I look at 41438476 who is a a planned giving person created by Jason - the fact she was uploaded to silverpop must mean that we have been uploading non-donors unless they had some opt out
1:12 PM in which case I guess nothing has changed & I'm worrying about a non-thing

https://gerrit.wikimedia.org/r/609880

Change 609896 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Remove exid from export_stat table.

https://gerrit.wikimedia.org/r/609896

Change 610412 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Add all funds last donation date to export_staging table

https://gerrit.wikimedia.org/r/610412

Change 609896 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Remove exid from export_stat table.

https://gerrit.wikimedia.org/r/609896

Change 610479 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Ensure num_rows is defined

https://gerrit.wikimedia.org/r/610479

Change 610480 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Remove a bunch of fields from silverpop_export_staging that are no longer used

https://gerrit.wikimedia.org/r/610480

Change 610412 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Add all funds last donation date to export_staging table

https://gerrit.wikimedia.org/r/610412

Change 610495 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Alter master email table to use latest donor rather than highest email id

https://gerrit.wikimedia.org/r/610495

Change 609557 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Fix column names & table name on prospect fields for silverpop

https://gerrit.wikimedia.org/r/609557

Change 610664 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Iterate array of updates to run

https://gerrit.wikimedia.org/r/610664

Change 610479 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Ensure num_rows is defined

https://gerrit.wikimedia.org/r/610479

Change 610946 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Fix test to have modified date for all contacts

https://gerrit.wikimedia.org/r/610946

Change 610949 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Add rebuild file for full table rebuild

https://gerrit.wikimedia.org/r/610949

@KHaggard the job just crashed- I set it going again but it will probably be too late

The issue was a bit of an obscure timing issue - I think this should address in the short term

https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/611188

And in the longer term https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/609238

Ok, that makes sense, as I only saw one file processed this morning. Both unsubs file and DatabaseUpdate file updated successfully in Acoustic. Will monitor this task for when the new file you're sending over is ready.

Oh wait, I actually see there's a file ready to run right now. Starting that job now and will report back here on how it looks.

Fresh data job run looks ok. One thing I'll point out just in case is that the Master Suppression List total rows has increased by 12k since yesterday. I'm used to MSL increasing but since we haven't sent out emails for a while, it looks a little strange. Does that seem right to you, @Eileenmcnaughton ?

7.10.png (1×1 px, 195 KB)

Also attaching yesterday's run below for comparison:

7:9.png (1×1 px, 194 KB)

Just noting the job completed and took 3h 47 - which compares with 4.5 hours before we started the refactoring and before we started adding extra fields.

This is not the end goal but indicates improvement based on the changes deployed on Thurs/ Fri (the Fri one was just addressing the issue that hit on Thurs)

I suppose I have one theory about the 12k - which might be timing related. I think that we should push through the next fixes on rebuilding the staging table and then consider purging and rebuilding the suppression last since I think there may have been a slow leak of timing-related ones sneak in there & maybe a few more just recently. There is a separate ticket on that.

Change 610946 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Fix test to have modified date for all contacts

https://gerrit.wikimedia.org/r/610946

Change 610949 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Add rebuild file for full table rebuild

https://gerrit.wikimedia.org/r/610949

Change 609238 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Silverpop - build main table incrementally

https://gerrit.wikimedia.org/r/609238

Change 610664 abandoned by Eileen:
[wikimedia/fundraising/tools@master] Iterate array of updates to run

Reason:
merged into the other

https://gerrit.wikimedia.org/r/610664

Change 610480 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Remove a bunch of fields from silverpop_export_staging that are no longer used

https://gerrit.wikimedia.org/r/610480

Change 612661 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@deploy] Remove a bunch of fields from silverpop_export_staging that are no longer used

https://gerrit.wikimedia.org/r/612661

Change 612661 merged by jenkins-bot:
[wikimedia/fundraising/tools@deploy] Remove a bunch of fields from silverpop_export_staging that are no longer used

https://gerrit.wikimedia.org/r/612661

Change 612695 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Convert main stats table to incremental updates

https://gerrit.wikimedia.org/r/612695

Change 612699 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Remove reference to silverpop_export_address

https://gerrit.wikimedia.org/r/612699

Change 612700 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Fix the majority of tables to be incrementally updated

https://gerrit.wikimedia.org/r/612700

@KHaggard I deployed another change today - can you check the results look OK?

Export time: 3:21 (compares to 3h 47 before the patch and hours before we started the refactoring and before we started adding extra fields

Change 612695 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Convert main stats table to incremental updates

https://gerrit.wikimedia.org/r/612695

@KHaggard another change got deployed today - it switches us to building the statistics table incrementally rather than rebuilding from scratch each time. The change should get us under 3 hours - keep an eye on the import.

Hi @Eileenmcnaughton . Sorry for the delay, I was off work yesterday. Looking at the results of the import jobs, Two July 15th files were processed - once on July 15 and again today. The July 16th file processed this morning as expected and looks healthy.

I didn't notice any big issues here, but one thing I'll note: when Unsubscribes-20200715035503.csv processed again today (July 16), all totals matched yesterday's job as expected except for Total Duplicate Addresses, which increased by 5,293. Since all other aggregates matched perfectly, I found this increase worth flagging. Let me know what you think. Attaching a screenshot below.

7.15 comp.png (470×1 px, 182 KB)

Change 612699 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Remove reference to silverpop_export_address

https://gerrit.wikimedia.org/r/612699

Just coping this from the latest gerrit comment because it seems relevant more broadly

"jgleeson this is actually near the end - it switches from fully rebuilding the table to incrementally updated & adding new rows (the update is via delete + insert). We are not updating deleted rows since they are filtered out although there is a ticket for next sprint to consider sending up that data so it is more logical in silverpop.

There is one more significant piece after this which is to add a filter on the view such that modified date > 2 days ago. My expectation is that will cut another hour off the process. I chatted with David yesterday & I think I'm going to define that as 'done' for now since we are getting into diminishing returns but have another integrity round on unsubscribes against the other tickets coming in next sprint - if you wanted to take the lead on those tickets it might be good since I think in many ways switching the driver seat is the best form of review"

@KHaggard I don't totally understand those 2 screenshots but is it the number already uploaded ie

  1. Silverpop has 100 addresses on the list
  2. I run the daily & add another 10 - that number will be 100
  3. I run the next daily & add another 10 - that number will be 110
  4. I re-run the daily to do some tests and there are no new ones - that number will still increase to 120 because all 120 in my list are already uploaded

Oh ok, I'm understanding that there could be a 'delay' in what populates the duplicate addresses row. Sounds like no issue then. Today's import job looks great.

Change 612700 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Fix the majority of tables to be incrementally updated

https://gerrit.wikimedia.org/r/612700

@Eileenmcnaughton I only see one outstanding patch on this task: https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/610495. Should that be a smaller task and is this one still in review?

Change 614895 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Add delete based on contact_id being modified

https://gerrit.wikimedia.org/r/614895

Change 614895 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Add delete based on contact_id being modified

https://gerrit.wikimedia.org/r/614895

@KHaggard we deployed another update today - it brought the time taken to just under 2 hours - can you check again.

We have one more change in this effort to speed it up - which will mean that the full db is no longer updated & only those recently modified are. I don't quite know how you will be able to check that one since the comparisons won't work well at that point. I expect that to take the total time to under 1 hour and to speed it up at the silverpop end too

Awesome, thanks for the work on this! I can confirm that the nightly import data jobs are running smoothly with aggregate totals that make sense. Thanks for the heads up that we will soon change to processing recently modified contacts only - I will keep that in mind for my checks.

@mepps just noting that with then new table my intention was then to use that table in the silverpop script to delete those ids from silverpop staging in place of this query (in update_silverpop_staging.sql)

Remove any privacy deletes by getting rid of emails no longer in the email table.
This is a relatively long time to lock this table. It takes 1-2 minutes longer
if we use the log_civicrm_email table and I'm not sure it's any less bad to lock
that instead. A better way would be to temporarily record the emails to delete.
Query OK, 243 rows affected (1 min 0.45 sec)
https://phabricator.wikimedia.org/T257001
DELETE s FROM silverpop_export_staging s

LEFT JOIN civicrm.civicrm_email l
ON s.id = l.id
WHERE l.id IS NULL;

and also to check out the other tables to see if they hold records by email_id

As I'm looking at this, I'm wondering if we should check for other ways emails might get deleted? We could add a hook into forget for this..

Also I added the delete queries locally. I'm curious if we should take the ones referencing log_civicrm_emails out as well. My instinct is yes but as I mentioned above I'm worried there's other ways emails are getting deleted. I'm also deleting everything from the deleted email table.

Change 615794 had a related patch set uploaded (by Mepps; owner: Mepps):
[wikimedia/fundraising/tools@master] WIP start new delete queries for emails

https://gerrit.wikimedia.org/r/615794

@Eileenmcnaughton I put up a WIP task for the delete queries but I'm also curious how you get the timing in the comments. I assume from running on staging?

@mepps yes the timing is from staging - but actually you can look at the log on silverpop_build_emails job to get more accurate timing - re "We could add a hook into forget for this" - yes - I'm imagining that the table now being populated from your code is used instead of a generic table comparison

@Eileenmcnaughton sorry if my comment was confusing. I meant that I'm wondering too if we need to try to catch other deletes than just forgotten contacts.

@mepps I think we can get those from the log_civicrm_email - the ones that are a real problem are the forgotten ones as they are fully purged from the log too

Change 616948 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Fix tests to purge tables between tests

https://gerrit.wikimedia.org/r/616948

As I noted on the task, I tried running this query in dev_civicrm for some reason it took over 1 minute to run three queries. That seemed pretty high.

I wonder if we should move this final cleanup to a new task.

Change 616948 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Fix tests to purge tables between tests

https://gerrit.wikimedia.org/r/616948

Change 618407 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/tools@master] Add more tables to explainer

https://gerrit.wikimedia.org/r/618407

Change 618407 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Add more tables to explainer

https://gerrit.wikimedia.org/r/618407

Change 615794 merged by jenkins-bot:
[wikimedia/fundraising/tools@master] Use privacy delete email ids from the new table to ensure they are culled from export_staging

https://gerrit.wikimedia.org/r/615794

Hi there, looks like the import job failed this morning because there was no file to retrieve. I wonder if the civi server upgrade from last night could have cancelled the file export? If that's the case, maybe it'll start working normally again tomorrow. I know Eileen is out today, so tagging in @Ejegg or @jgleeson. Could one of you take a look if you have time? Thanks :)

Just updating on that last comment ^

David pinged me and confirmed the export finished successfully a few hours ago, and I manually ran the import job on my side successfully as well. Thanks!

I just deployed the fix that limits the size of the export file to recently modified. The job took 48 minutes and I'm expecting 1,345,204 rows to be in it. These should all be contact records that were in some way updated during the last week.

@KHaggard can you check last night's export went OK.

@Ejegg pointed out we are updating some deleted contacts - there seem to have been < 4000 over recent weeks. I did some analysis and these seem to be where the email is no longer in our DB (after a merge) - so they wind up on our suppression list and getting an update with their new totals. In some cases I think the email should have been preserved as a secondary address during dedudpe (identified i https://phabricator.wikimedia.org/T256316 & with a fix in review).

Overall I think the updated-deleted question is a variant of https://phabricator.wikimedia.org/T254304 & I'll look & see if I should add more notes over there

Hi @Eileenmcnaughton

Three file bundles processed last night, I'm listing the DatabaseUpdate files below. Thanks for letting me know about the change, confirming that total rows looks to be around the number you specified, 1.34k for the August 11 files. I didn't list the unsubscribes files below, as they look normal and the same as before, but let me know if you need them too.

First file:

Contact Source Name: _all_Wikimedia
File Name: DatabaseUpdate-20200810041011.csv
Job ID: 169657193
Job Type: Recurring Contact Source Import @ Tuesday, August 11, 2020 at 9:02:11 AM GMT
Job Status: Complete
Total Rows: 20808732
Total Valid Rows: 20796628
Total Invalid Addresses: 1346
Total Duplicate Addresses: 12
Total Disallowed Addresses: 10717
Total Bad Data: 41

Second file:

Contact Source Name: _all_Wikimedia
File Name: DatabaseUpdate-20200811012647.csv
Job ID: 169657197
Job Type: Recurring Contact Source Import @ Tuesday, August 11, 2020 at 9:02:15 AM GMT
Job Status: Complete
Total Rows: 1345396
Total Valid Rows: 1345275
Total Invalid Addresses: 41
Total Duplicate Addresses: 0
Total Disallowed Addresses: 80
Total Bad Data: 0

Third file:

Contact Source Name: _all_Wikimedia
File Name: DatabaseUpdate-20200811034144.csv
Job ID: 169657200
Job Type: Recurring Contact Source Import @ Tuesday, August 11, 2020 at 9:02:18 AM GMT
Job Status: Complete
Total Rows: 1343434
Total Valid Rows: 1343315
Total Invalid Addresses: 41
Total Duplicate Addresses: 0
Total Disallowed Addresses: 78
Total Bad Data: 0

Change 610495 abandoned by Eileen:
[wikimedia/fundraising/tools@master] Alter master email table to use latest donor rather than highest email id

Reason:

https://gerrit.wikimedia.org/r/610495