Page MenuHomePhabricator

FY2324 Q3: Need to map new Civi fields over to Acoustic
Closed, ResolvedPublic

Description

Starting a task so we don't forget to import new fields over to Acoustic, if they are ready.

Civi has these new fields:
donor_segment
donor_status
donor_segment_id
donor_status_id

If the values for those two fields are finalized and ready to use, let's aim for Q3 maintenance week to get them into Acoustic. @Eileenmcnaughton Can you please confirm field type for those - are they text fields?

We also need to map these over to Acoustic:
both_funds_usd_total_fy1819
both_funds_usd_total_fy1920
both_funds_usd_total_fy2021
both_funds_usd_total_fy2122
both_funds_usd_total_fy2223
both_funds_usd_total_fy2324
both_funds_usd_total_fy2425
both_funds_usd_total_fy2526

Which can be done the same time as the donor segment fields.

Event Timeline

@KHaggard the list of options for the fields is at https://civicrm.wikimedia.org/civicrm/wmf-segment (in the reports menu) - we can export the value (numeric) or the label (text) or both

Thanks @Eileenmcnaughton - I think we would actually like to do both, so 4 new custom fields actually. What do you currently have those numeric values labeled like in Civi?

Yes those @Eileenmcnaughton . What do you think makes sense as the export name for those? Something like:

donor_segment
donor_segment_value
donor_status
donor_status_value

@KHaggard that seems fine - although the database fields for the values are donor_segment_id and donor_status_id - so maybe matching those will be less confusing to someone down the line

Oh yeah thanks, that was what I was wondering. I like "id" better too. Ok so they will be:

donor_segment
donor_segment_id
donor_status
donor_status_id

I have a quick question. I was going to set numeric for the field type for the id fields. Acoustic formats numeric fields like this (e.g. 32.00) with two spaces after the decimal point. Does this work with you @Eileenmcnaughton or is there a reason we should format those numbers as a text field to retain the whole number as they present in Civi?

@KHaggard I think that's OK - the advantage of importing the numbers is that you might be able to use > & <

I don't know for sure if we have to pass them in 2 digit decimal format in our export but that is not hard to do if so

Got it, thanks. It might work regardless of how you export the whole numbers, but I just wanted to let you know that's how they present in Acoustic in case that matters. Thanks @Eileenmcnaughton .

Also, just as a refresher, when we do this, I'm going to replace these numeric fields as follows:

AF_usd_total_2018	replace with 	   both_funds_usd_total_fy1819
AF_usd_total_2019	replace with	   both_funds_usd_total_fy1920
AF_usd_total_2020	replace with	   both_funds_usd_total_fy2021
AF_usd_total_2021	replace with       both_funds_usd_total_fy2122
AF_usd_total_2022	replace with	   both_funds_usd_total_fy2223
AF_usd_total_2023       replace with       both_funds_usd_total_fy2324

I'm also going to add these new numeric fields:

both_funds_usd_total_fy2425
both_funds_usd_total_fy2526

And I'm going to delete / retire these old ones:

AF_usd_total_2014		
AF_usd_total_2015	
AF_usd_total_2016	
AF_usd_total_2017

If you can make the same adjustments in the export file that would be great. I'm closer to your time zone @Eileenmcnaughton right now since I'm in UTC-10.

@KHaggard that looks right - I'm not sure how much you want to sync the field names with the underlying Civi field names but the Civi field names follow this format for the fiscal year totals fields

all_funds_total_2025_2026 `

Bringing over this comment from the other task as it lists the steps

The existing patch is here https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/935182 - it just needs to be finalised with the final fields - which I think are per https://phabricator.wikimedia.org/T339296#9029958

Note that there are 4 aspects to deploying this. Step 4 can be done at any point but it is necessary to have interim view changes if deployed before step 2

there are changes required to the underlying fields in the silverpop database on staging. The new fields need to be added to silverpop_export_statetc per changes in this file

the underslying code to populate them needs to be rolled out these changes

the underlying data needs to be populated. This is pretty painful - sorry. Basically it involves temporarily altering the date range for the export & then re-running the whole thing. We might hold off on this for another phab though & do it once more data is populated in the segment fields

the view needs to be updated. The view changes are in [this patch at the bottom] but could be in their own patch to deploy before or after 2 (https://gerrit.wikimedia.org/r/c/wikimedia/fundraising/tools/+/935182/3/silverpop_export/update_table.sql#646) Note that we can alter the underlying data, and logic and as long as we don't change the fields included in silverpop_export_view we don't need to co-ordinate with Katie. That means we can, for example, drop the old fields and just update the view to export the value 0 as a place holder or add the new fields, but not include them as yet

On naming - I realise that in the final view we use the terminology 'both_funds' but elsewhere in the export we use 'all_funds' - which explains why both are referenced in this phab. We try to stick with the CiviCRM field names where possible but also be consistent with the view - so we will add all_funds_usd_total_2018_2019 to the table silverpop_export_stat but call it both_funds_usd_total_2018_2019 in the final view

From email - this are the deployment steps from Email team (Katie)'s POV

We usually try to time it like the following:

  1. On Friday, Katie deactivates the automatic import process
  2. On Monday, Katie manually runs the import jobs from over the weekend to clear out the FTP
  3. After that, Eileen exports the new files to be mapped to FTP
  4. Katie remaps all four recurring import jobs with the new files and runs them, checks the summary reports to ensure no bad data.
  5. Katie confirms successful job with fr-tech
  6. Then we send a full-file overwrite to the remapped database (takes a day or two)
  7. Katie runs QA to ensure the field types and values are correct (and fr-tech might run their own QA on the Civi side)
  8. Done

Ok. For now, I think I want to stick with how I wrote the fiscal year totals above.

If we change some of the fields to all_funds I would need to do it everywhere else too and that feels kind of like a different project to me. We would also need to change the "AF" for Annual Fund to something else so it's not confused for "all_funds", you know what I mean?

I think that in the future we should rethink the fund naming again, because back in 2020 when we reformatted names, I didn't know how annoying it was to have capital letters in field names. Someday soon it would be nice to make everything lowercase and standardized after agreeing with teams how we want to differentiate the "Annual Fund" fields from the "All Funds" fields.

Yeah I guess there are 2 approaches

  1. keep Acoustic in sync with CiviCRM or
  2. use a different naming scheme for Acoustic

From our point of view none of our systems know or care what the Acoustic field names are so it's fine for them to be differently named. We did try to bring them in sync in 2020 as you mentioned. In our underlying table, unfortunately annual_fund is denoted by the absence of a prefix (for historical reasons) - it would be a huge job to change that and I don't see us doing that on existing fields - but we should probably prefix any new annual_fund fields we add with the prefix.

update " none of our systems know or care" - actually we have a few lines of code that set the header names in the fields we export which we set to whatever is useful at your end

Here is the list of the underlying field names in CiviCRM

all_funds_change_2022_2023
all_funds_change_2023_2024
all_funds_change_2024_2025
all_funds_change_2025_2026
all_funds_first_donation_date
all_funds_largest_donation
all_funds_last_donation_date
all_funds_number_donations
all_funds_total_2018_2019
all_funds_total_2019_2020
all_funds_total_2020_2021
all_funds_total_2021_2022
all_funds_total_2022_2023
all_funds_total_2023_2024
all_funds_total_2024_2025
all_funds_total_2025_2026
change_2022_2023
change_2023_2024
change_2024_2025
change_2025_2026
date_of_largest_donation
donor_segment_id
donor_status_id
endowment_change_2022_2023
endowment_change_2023_2024
endowment_change_2024_2025
endowment_change_2025_2026
endowment_first_donation_date
endowment_largest_donation
endowment_last_donation_date
endowment_lifetime_usd_total
endowment_number_donations
endowment_total_2018_2019
endowment_total_2019_2020
endowment_total_2020_2021
endowment_total_2021_2022
endowment_total_2022_2023
endowment_total_2023
endowment_total_2023_2024
endowment_total_2024
endowment_total_2024_2025
endowment_total_2025
endowment_total_2025_2026
first_donation_date
first_donation_usd
largest_donation
last_donation_amount
last_donation_currency
last_donation_date
last_donation_usd
lifetime_including_endowment
lifetime_usd_total
number_donations
total_2017_2018
total_2018_2019
total_2019_2020
total_2020_2021
total_2021_2022
total_2022_2023
total_2023
total_2023_2024
total_2024
total_2024_2025
total_2025
total_2025_2026

@Eileenmcnaughton Understood. Yeah, let's pin this for a later time. For now, I think it would be fine to leave Civi as-is but just rename the export file / job with whatever fits with Acoustic (as we have been doing).

For instance:
both_funds_usd_total_fy1819
or
both_funds_usd_total_2018_2019

Is there a pragmatic reason to have the years split out like that? I'm fine with either.

@KHaggard - no - no practical reason for the 2018_2019 vs fy1819 - I think I generally try to avoid abbreviations cos they don't always mean the same thing to everyone but that is a general thing not specific to this

Ah ok got it. I just did a poll with the email team on which one makes more sense to them and they all unanimously voted for:

both_funds_usd_total_fy1819

Is it alright with you if we go with that name format for now for the usd total fields? @Eileenmcnaughton

Hi, so I've paused the data jobs in Acoustic now and I'm changing the existing fields to the new names and purging the old values. I'll update when finished and when I'm ready for the mapping file. @Eileenmcnaughton

Actually, let me clarify, you can send over the mapping file whenever you're ready, I just won't be able to map until closer to my end of day since reorganizing the fields is taking Acoustic a long time to process.

If you wouldn't mind making sure the regular export that would come in tomorrow is turned off for now that would be great, the next file that Acoustic grabs has to be the remapping files only. If it's anything else, then Acoustic will just re-add the fields I'm trying to change.

Thanks!

Change 935182 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/tools@master] Update Silverpop export to include new fields

https://gerrit.wikimedia.org/r/935182

Alright I'm all set now and ready to map whenever you are @Eileenmcnaughton Below is what I named everything and it's field type:

Acoustic field namefield type
both_funds_usd_total_fy1819numeric
both_funds_usd_total_fy1920numeric
both_funds_usd_total_fy2021numeric
both_funds_usd_total_fy2122numeric
both_funds_usd_total_fy2223numeric
both_funds_usd_total_fy2324numeric
both_funds_usd_total_fy2425numeric
both_funds_usd_total_fy2526numeric
donor_segment_idnumeric
donor_status_idnumeric
donor_segmenttext
donor_statustext

Change 935182 merged by jenkins-bot:

[wikimedia/fundraising/tools@master] Update Silverpop export to include new fields

https://gerrit.wikimedia.org/r/935182

@KHaggard I just kicked off the job from our end - the donor_segment & donor_status aren't populated right now but you should be able to map the empty field & I'll fix up the data from our end

Awesome, thanks @Eileenmcnaughton. I'm going to grab a smoothie really quick but when I get back I'll map it. Please let me know when it's up. :)

@KHaggard I tink it's there now - it might push up a second one in an hour or so

Right now it's telling me there's nothing on FTP. @Eileenmcnaughton Just an FYI.

@KHaggard The regularly scheduled job seems to be running now....

Oh ok. It is possible to cancel that? Acoustic is at a place now where if it attempts to run the "old regularly scheduled" file it will try to add back the fields I just deleted, which could take several hours to re-delete. I remember that happened in the past once.

Is it possible to have only the remapping file sitting in the folder by itself? @Eileenmcnaughton

@KHaggard - the re-run will send the new fields - we have deployed the update so the code will only generate the new version now, not the old one

I've found & fixed the issue with the status & segments - & it will be fixed in the next file we push up (but it's a data-only issue - it won't affect the mapping)

Ok cool. I'm hoping to get other members of the email team to watch me map the files so they know how to do it in the future. Since it's getting late my time, I'm thinking I'll push the mapping to tomorrow at a time where others can watch "live".

I have deactivated the data jobs in Acoustic, so any new generated files will just sit in the queue until I map whatever is "first" in the FTP folder. As long as they all have the new mapping fields in them, it should be okay.

Thanks @Eileenmcnaughton I'll update you again tomorrow!

OK @Eileenmcnaughton we were able to successfully map everything. Just running the files that were in queue now. I'm going to do a quick check that they all complete successfully and then I'll comment again here.

Thank you for your work on that export file!

Alright, everything looks fine to me @Eileenmcnaughton .

Should we leave this ticket open until we can do the full file push for everything?

Change 994348 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/tools@master] Fix fix again

https://gerrit.wikimedia.org/r/994348

Change 994348 merged by jenkins-bot:

[wikimedia/fundraising/tools@master] Fix fix again

https://gerrit.wikimedia.org/r/994348

Change 994349 had a related patch set uploaded (by Eileen; author: Eileen):

[wikimedia/fundraising/tools@master] Do some coalescing

https://gerrit.wikimedia.org/r/994349

Change 994349 merged by jenkins-bot:

[wikimedia/fundraising/tools@master] Do some coalescing

https://gerrit.wikimedia.org/r/994349

@KHaggard - it says it succeeded at uploading the full multi-gig file....

Thanks @Eileenmcnaughton - so, the databaseupdate csv processed fine to me, summary below:

Contact Source Name: _all_Wikimedia
File Name: DatabaseUpdate-20240130233414.csv
Job ID: 240550342
Job Description: Adding and updating contacts in existing contact source, _all_Wikimedia
Total Rows: 30311241
Total Valid Rows: 30295882
Total Invalid Addresses: 1848
Total Duplicate Addresses: 11
Total Disallowed Addresses: 13466
Total Bad Data: 45

However, something went wrong with the Unsubscribes-* csv. It ran twice, the Jan 30 file was blank and the Jan 31 file is only 143k rows when it should be 9 million.

Contact Source Name: MASTER SUPPRESSION LIST
File Name: Unsubscribes-20240130233414.csv
Job ID: 240550240
Job Description: Adding contacts to existing contact source, MASTER SUPPRESSION LIST
Total Rows: 0
Total Valid Rows: 0
Total Invalid Addresses: 0
Total Duplicate Addresses: 0
Total Disallowed Addresses: 0
Total Bad Data: 0

Contact Source Name: MASTER SUPPRESSION LIST
File Name: Unsubscribes-20240131025408.csv
Job ID: 240550242
Job Description: Adding contacts to existing contact source, MASTER SUPPRESSION LIST
Total Rows: 143280
Total Valid Rows: 143176
Total Invalid Addresses: 42
Total Duplicate Addresses: 1
Total Disallowed Addresses: 61
Total Bad Data: 1

Lastly, the Jan 31 checksum file is smaller than the Jan 30 one, but aren't they always full file and should be the same or increasing?

File Name: ChecksumEmails-20240130233414.csv
Job ID: 240550302
Job Description: Adding and updating contacts in existing contact source, _all_Wikimedia
Total Rows: 30311208
Total Valid Rows: 30295882

File Name: ChecksumEmails-20240131025408.csv
Job ID: 240550320
Job Description: Adding and updating contacts in existing contact source, _all_Wikimedia
Total Rows: 30311026
Total Valid Rows: 30295699

I think a small reduction in the checksum emails could be due to some deduping but I need to get my head around the unsubscribes one - I can't think why we were always sending the full file but I haven't re-read our docs.

I might ask @Dwisehaupt to set aside our backup of the silverpop database from a few days ago in case we need to check in it

Aha - I see it - it looks like it likely just needs rebuilding - but I can't recall why we export them all - running

INSERT INTO silverpop_excluded (email)
SELECT DISTINCT e.email
FROM civicrm.log_civicrm_email e
LEFT JOIN civicrm.civicrm_contact c ON c.id = e.contact_id
-- see comment block for long discussion of this WHERE
WHERE e.id <= (SELECT MAX(id) FROM silverpop_export_staging)
  #AND c.modified_date > DATE_SUB(NOW(), INTERVAL @offSetInDays DAY)
ON DUPLICATE KEY UPDATE email = silverpop_excluded.email;

ok @KHaggard - the next file to export - ie in about 18 hours time - should have a fuller Unsubscribe list - can you check in on it after that

I think we exported the full Unsubscribes list because it wasn't causing any slowdowns to do it and it helps us make sure there isn't errors like that one time in 2018 where we saw the MSL shrinking over time and we only were able to catch and fix it because of the full data send summary reports.

If you think it's better to have it be "last 7 days" similar to the DatabaseUpdate-* job, let me know because I will need to update my documentation and share with the rest of the team.

But, if it's no big deal to have it be full export daily, then I'm partial to that so we can have a better monitor on it.

@Eileenmcnaughton thanks for generating a full file so we can check it. Right now I just have the following but it might come through tomorrow:

Contact Source Name: MASTER SUPPRESSION LIST
File Name: Unsubscribes-20240201025553.csv

Job ID: 240571501
Job Description: Adding contacts to existing contact source, MASTER SUPPRESSION LIST
Job Type: Recurring Contact Source Import @ Thursday, February 1, 2024 at 9:00:04 AM GMT
Job Status: Complete
Total Rows: 147941
Total Valid Rows: 147834
Total Invalid Addresses: 43
Total Duplicate Addresses: 1
Total Disallowed Addresses: 62
Total Bad Data: 2

@KHaggard - there is no push to not do the full list on our part - I think I just need to check in on our documentation to ensure what you just said is captured

Ok, great. @Eileenmcnaughton the data jobs for today completed and all 4 of them look good to me. The full file Unsubscribes looks like what we'd expect too:

Contact Source Name: MASTER SUPPRESSION LIST
File Name: Unsubscribes-20240202031149.csv
Job ID: 240591180
Job Description: Adding contacts to existing contact source, MASTER SUPPRESSION LIST
Job Type: Recurring Contact Source Import @ Friday, February 2, 2024 at 9:00:21 AM GMT
Job Status: Complete
Total Rows: 9191413
Total Valid Rows: 9186566
Total Invalid Addresses: 2395
Total Duplicate Addresses: 121
Total Disallowed Addresses: 2448
Total Bad Data: 4

Thanks for this. I'll start poking around our _all_Wikimedia database to make sure all looks good. Would you be willing to ping me when the Civi side donor segmentation QA is complete?

Quick update: Did a spot check with some names in Acoustic and it looks good from my side :)

@KHaggard we have no outstanding tasks on this - all the things @JMando found should be addressed (last week)

XenoRyet set Final Story Points to 4.