Page MenuHomePhabricator

As an email stats updater, I need to find results for a send that is missing from Superset
Open, MediumPublic

Description

On Monday, December 7th, the email team sent out a batch of subject line tests for our Recent donor target group. The 5th segment send got stuck in a pending status and we caught it a few hours later. When @MNoorWMF scheduled the new send for this single segment, the process dislodged the first one from pending, resulting in two duplicate sending jobs. She canceled the first job and the second job went out to the full list successfully.

The stats we want to view is for the "second job" I mention above, which dispersed to the full, correct list. The mailing_name is R2-5b_20201207_UnitedStates(US)_English(en)_Email3_SL-Copy-DifferentCircumstances_Recent (2) and the mailing_identifier should be sp69308812. It's not showing up in Superset, which isn't surprising given the complexity of this send process, but there is a record of the send in Acoustic.

Would @EYener and team be willing to track down this segment in Superset so I can update the results in the stats sheet? Thanks!

Event Timeline

KHaggard created this task.Dec 9 2020, 11:16 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 9 2020, 11:16 PM
KHaggard renamed this task from As a email stats updater, I need to find results for a send that is missing from Superset to As an email stats updater, I need to find results for a send that is missing from Superset.Dec 9 2020, 11:17 PM
KHaggard updated the task description. (Show Details)Dec 9 2020, 11:22 PM

Hey FR-Tech team! I'm not seeing this in the database when I search for it, so I think it might be an import issue from Acoustic.

SELECT * FROM civicrm.civicrm_mailing_provider_data WHERE mailing_identifier = 'sp69308812' LIMIT 5;

KHaggard updated the task description. (Show Details)Dec 9 2020, 11:25 PM

@EYener Is this actively blocking any work? We're a bit full on tasks at the moment and I'm wondering what priority this should be.

That would be a better question for @KHaggard and team to prioritize in terms of email tasks. I'm not sure how large this send is or when the Email team would want this data back for analysis.

KHaggard added a comment.EditedDec 11 2020, 6:30 PM

Thanks for the tag! I would set this as a medium or low priority. I would very much like this to show up in Superset at some point, but it's not a blocker for our processes. I can pull the stats up the old fashioned way for now. The send size for this segment is 15k. Thanks all!

Sounds good, @KHaggard. What segment was it? We can put a note on any of our reports that this is still pending.

It's the 5th subject line segment from the "Recent" target group send on Dec 7th. It's just one single mailing_id (sp69308812) and the mailing name is R2-5b_20201207_UnitedStates(US)_English(en)_Email3_SL-Copy-DifferentCircumstances_Recent (2)

It might be worth checking with @bsisolak what's different about this mail - I've tried re-fetching the date range in question & it is still not downloaded

(a) what is Superset? (b) what is the API call you are making, do you have the XML?

@EYener may be able to elaborate more on those ^questions, but some quick context about Superset:

it's a data visualization and data exploration platform that we've started using to pull together all stats from our email sends into one place. I think some of the data gets fetched directly from Acoustic, and this one mailing_id that's visible in Acoustic isn't making it's way to Superset's data table.

Ah... ok. So let's look at the specific API call. Need the full XML. Thanks.

<Envelope>
<Body>

		<GetSentMailingsForOrg><DATE_START>12/01/2020 00:00:00</DATE_START><DATE_END>12/16/2020 22:28:20</DATE_END><SHARED/><EXCLUDE_TEST_MAILINGS/><EXCLUDE_ZERO_SENT/><SENT/><SENDING/></GetSentMailingsForOrg>

</Body>
</Envelope>

Ah yes, remove <SENT/> and that will return all types, included Canceled.

@bsisolak including cancelled seems confusing - but in this case that is the actual status? It's obviously an easy change to make I just want to check what the impacts will be & if anyone needs to be aware of them

Change 651006 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Do not exclude cancelled (& other?) statuses from mailing retrieval

https://gerrit.wikimedia.org/r/651006

I found an email from 2015 wher eI outlined this API call. I forgot there is an unpublsied elements called SENT_CANCELLED. I'll reforward the email.

@Eileenmcnaughton I don't think we cancel emails very often, so I don't see a blocker from my side. Is it possible to allow cancelled sends in Superset for this specific date range?

@EYener - Would adding cancelled impact your work in a negative way?

@KHaggard No issues here! Whatever @Eileenmcnaughton decides is best for ingesting this data, we can manage on our side. Thanks for checking!

I've put up this commit

https://github.com/eileenmcnaughton/omnimail-silverpop/commit/49ffc93c7550abe840cf9b7340f79c08f88ed226

I have added SENT_CANCELLED to all places where I see SENT and SENDING & am confirming with @bsisolak

Oddly in 2 places they are lower case & I'm wondering if those are actually ignored in those places

Change 651304 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Fix failure to retrieve cancelled_sent mailings

https://gerrit.wikimedia.org/r/651304

Change 651304 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Fix failure to retrieve cancelled_sent mailings

https://gerrit.wikimedia.org/r/651304

Change 651609 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm/vendor@master] Vendor update

https://gerrit.wikimedia.org/r/651609

Change 651609 merged by jenkins-bot:
[wikimedia/fundraising/crm/vendor@master] Vendor update

https://gerrit.wikimedia.org/r/651609

Change 651006 abandoned by Eileen:
[wikimedia/fundraising/crm@master] Do not exclude cancelled (& other?) statuses from mailing retrieval

Reason:

https://gerrit.wikimedia.org/r/651006

DStrine triaged this task as Medium priority.Jan 5 2021, 10:12 PM

Change 654739 had a related patch set uploaded (by Eileen; owner: Eileen):
[wikimedia/fundraising/crm@master] Update mrmarkfrench/silverpop-php-connector

https://gerrit.wikimedia.org/r/654739

Change 654739 merged by jenkins-bot:
[wikimedia/fundraising/crm@master] Update mrmarkfrench/silverpop-php-connector

https://gerrit.wikimedia.org/r/654739

I just checked & sp69308812 is now there!

I re-imported all the mailings I could - to 450 days ago - so it might fill in some other gaps. There have been other reports of missing mailings over time & it seems likely that they were the same issue 'cancelled during sending' - so this might be a case of 'mystery solved'

Awesome, thank you @Eileenmcnaughton !

Question for you now that I see this. Is data only pulled from Acoustic for the last 450 days (for all tables in civicrm mailing tables)? IE if a mailing was sent 460 days ago and someone opens it or clicks it, it sounds like that data is NOT passed back - is that correct?

@EYener my understanding is that is how long Silverpop keeps info for

We started pulling data into civi because acoustic drops it after a while. If there is missing data it might not be available. I don't exactly remember when we started this project so we only have data back to a certain date.

Yep the 450 day limit is how far Silverpop can go back to via the api (& I believe in it's own db too)

Yeah 450 days sounds correct to me too. Thanks so much for your work on this! @Eileenmcnaughton

Does this mean Superset is updated with this mailing or is there another step? Currently dropping in sp69308812 into the "mailing_identifier" section doesn't seem to be yielding results. @EYener

Really cool @Eileenmcnaughton & @DStrine . Sorry for belaboring the point....is this 450 day limit related to the created date? Or the date of the action taken? I can see it going either way and just want to make sure I understand. With the example that "if a mailing happened 460 days ago, but the open action happened yesterday" example, it seems like Acoustic would not store that because the storage limit is tied to the create date of 460 days ago - but I want to check.

This would be really helpful because it could put a hard (and easily understandable) cap on data cube refreshes and data updates; if there's no new data to look for past a create date of 450 days ago, there's no need to try!

@Eileenmcnaughton can you share the query for finding mailing sp69308812? This is returning an empty set:
SELECT * FROM civicrm.civicrm_mailing_provider_data WHERE mailing_identifier = 'sp69308812' LIMIT 5;

@KHaggard once we get it in the database (iE the query above returns the data from Acoustic) we will still need to wait for the next regular data cube run at either 1PM or 1AM UTC to populate Superset with the data.

@EYener - @bsisolak would be the best person to answer that question on the 450 days. On the data in civicrm_mailing_provider_data - I'll have to check that - I was looking at it being missing from civicrm_mailing - so that means a separate api call is either not returning it or it is otherwise missing for some reason

Your data retention policy is set to 450 days, you can see this under Organization Settings -> Archive Settings. I'll have to test that specific question. I would suggest you should cap all data at 90-120 days, anything after that is likely not real user activity, and has no chance of substantially changing the data. Frankly, I suspect if you had a cutoff of 60 days (or even shorter) that would capture 99.8% of all your data.

@bsisolak adding SENT_CANCELLED to the GetSentMailingsForOrg call worked.

However, I'm not seeing events from those mailings retrieved from RawRecipientDataExport

This was the outgoing xml (the scheduled date was 2020-12-07 19:20:00)

<?xml version="1.0"?>
<RawRecipientDataExport> <EVENT_DATE_START>12/06/2020 00:00:00</EVENT_DATE_START>
<EVENT_DATE_END>12/09/2020 00:00:00</EVENT_DATE_END>
<MOVE_TO_FTP>1</MOVE_TO_FTP>
<ALL_EVENT_TYPES>1</ALL_EVENT_TYPES>
<COLUMN>

		<NAME>ContactID</NAME>
		<VALUE>ContactID</VALUE>

</COLUMN>
</RawRecipientDataExport>

Response

<Envelope>
<Body>

		<RESULT>

<SUCCESS>TRUE</SUCCESS>
<JOB_ID>177925837</JOB_ID>
<JOB_STATUS>RUNNING</JOB_STATUS>
<JOB_DESCRIPTION>Export raw recipient data.</JOB_DESCRIPTION>
<PARAMETERS>
<PARAMETER>
<NAME>FILE_ENCODING</NAME>
<VALUE>utf-8</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>INCLUDE_SEEDS</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>EVENT_END_DATE</NAME>
<VALUE>12/09/2020 00:00:00</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>MOVE_TO_FTP</NAME>
<VALUE>true</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>INCLUDE_INBOX_MONITORING</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>REPORT_ID_ARRAY</NAME>
<VALUE/>
</PARAMETER>
<PARAMETER>
<NAME>RETURN_SUBJECT</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>ERROR_FILE_NAME</NAME>
<VALUE>177925837.err</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>EMAIL</NAME>
<VALUE/>
</PARAMETER>
<PARAMETER>
<NAME>RETURN_FROM_NAME</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>TIME_ZONE</NAME>
<VALUE>10</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>ALL_NON_EXPORTED</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>EXPORT_FORMAT</NAME>
<VALUE>0</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>EVENT_TYPES</NAME>
<VALUE>[16, 15, 0, 1, 7, 8, 6, 4, 3, 2, 9, 9, 5, 10, 11, 12, 13, 14]</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>FILENAME</NAME>
<VALUE>Raw Recipient Data Export Jan 18 2021 22-55-38 PM 659</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>INCLUDE_TEST_MAILINGS</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>INCLUDE_FORWARDS</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>MAILING_ID_ARRAY</NAME>
<VALUE/>
</PARAMETER>
<PARAMETER>
<NAME>CODED_TYPES</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>NUM_LISTS_REMAINING</NAME>
<VALUE>1</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>CAMPAIGN_ID</NAME>
<VALUE>0</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>SEND_START_DATE</NAME>
<VALUE>10/22/2020 00:00:00</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>SEND_END_DATE</NAME>
<VALUE>12/09/2020 00:00:00</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>EXCLUDE_DELETED</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>RETURN_PROGRAM_ID</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>DEBUG</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>HARD_BOUNCES</NAME>
<VALUE>true</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>LIST_COLUMNS</NAME>
<VALUE>[ContactID]</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>EVENT_START_DATE</NAME>
<VALUE>12/06/2020 00:00:00</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>RETURN_CRM_CAMPAIGN_ID</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>SOFT_BOUNCES</NAME>
<VALUE>true</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>LIST_ID</NAME>
<VALUE>0</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>INCLUDE_QUERIES</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>MAILING_TYPE_ARRAY</NAME>
<VALUE>[3, 9, 4, 6, 7, 12, 14, 13, 16]</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>FORWARDS_ONLY</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>ENCRYPT</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>TOTAL_NUM_LISTS</NAME>
<VALUE>2</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>RETURN_FROM_ADDRESS</NAME>
<VALUE>false</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>RESULTS_FILE_NAME</NAME>
<VALUE>177925837.res</VALUE>
</PARAMETER>
<PARAMETER>
<NAME>RETURN_MAILING_NAME</NAME>
<VALUE>false</VALUE>
</PARAMETER>
</PARAMETERS>
</RESULT>

</Body>
</Envelope>

SENT_CANCELLED does not do anything with GetSentMailingsForOrg. What is the mailing ID of the canceled mailing?

On another note, I stronglly recommend using
<NAME>ALL_NON_EXPORTED</NAME>
<VALUE>true</VALUE>

It makes everything much easier.

Eileenmcnaughton added a comment.EditedJan 19 2021, 5:06 AM

The mailing id is 69308812

I didn't use the all_non_exported because I wanted to be able to re-fetch them. Now that things are fairly reliable that might be less of an issue - but it does mean that we would then be tracking by what Silverpop thinks we have rather than what we actually have