Page MenuHomePhabricator

Add time zone data back into Silverpop export file
Closed, ResolvedPublic2 Estimated Story Points

Description

A year or more ago, we dropped time zone data from the Silverpop export job because apparently it was causing the job to lag and sometimes time out. Now that the US list is so big, however, we should really start segmenting the US email list by geographic location, and time zone would be the easiest way to do this. Is this field still super costly to provide? Is there a way we could add it back in for our US records only (assuming that might speed things up)?

Alternately, we could add state data to the Silverpop database, but this will be a lot less simple for me to work with.

Event Timeline

Change 316728 had a related patch set uploaded (by Ejegg):
WIP local geocoding for US addresses

https://gerrit.wikimedia.org/r/316728

Change 316730 had a related patch set uploaded (by Ejegg):
Add timezone column to export

https://gerrit.wikimedia.org/r/316730

Hey all, I think we can do without the time zone data. We have state data for a lot of our US list (missing it for about 15-20% which isn't terrible considering we haven't tested this before) and we can use that to work with for now. The silverpop job failed last night for the first time in months and I'm thinking it has something to do with this... Happy to close this task, and sorry for the extra work!

Hi @CCogdill_WMF , sorry, the failure last night was only tangentially related. @awight overhauled a bunch of the code so it now has unit tests (really good news for reliability!), and I deployed it along with the refund/history fix before I noticed two lines in the FTP upload code that needed to be transposed. I swapped those two around this morning and re-uploaded the export a half hour ago. The timezone data patches are waiting on review.

Oh, okay! Thanks for confirming @Ejegg. If we can get the data, that's
awesome.

Change 316728 merged by jenkins-bot:
Local geocoding for US addresses

https://gerrit.wikimedia.org/r/316728

Change 316730 merged by jenkins-bot:
Add timezone column to export

https://gerrit.wikimedia.org/r/316730

Change 319125 had a related patch set uploaded (by Ejegg):
Geocode existing US addresses

https://gerrit.wikimedia.org/r/319125

Change 319125 merged by jenkins-bot:
Geocode existing US addresses

https://gerrit.wikimedia.org/r/319125

@CCogdill_WMF this is ready to deploy. the new column will be called 'timezone' and will come right after the postal_code column. It should have accurate data for every US address with a zip code.

Should I wait to deploy till you have a chance to update the import mapping on the Silverpop side?

Awesome, thanks @Ejegg! Can you deploy midday tomorrow?

Change 319256 had a related patch set uploaded (by Ejegg):
Explicitly create table, break up updates

https://gerrit.wikimedia.org/r/319256

Change 319256 merged by jenkins-bot:
Explicitly create table, break up updates

https://gerrit.wikimedia.org/r/319256

Change 319443 had a related patch set uploaded (by Ejegg):
Geocoding: trim zip to 5 characters before lookup

https://gerrit.wikimedia.org/r/319443

OK @CCogdill_WMF , this is deployed. The new field ought to show up in tonight's export. It'll be in the format 'UTC-5', 'UTC-7', etc.

Okay sounds good. Thanks very much!

Change 319443 merged by jenkins-bot:
Geocoding: trim zip to 5 characters before lookup

https://gerrit.wikimedia.org/r/319443

We got the time zone data, but it looked like it was only applicable to
~300k rows in the US list. Is that right? Are we missing something?

Oh darn, I hoped we had backfilled all of them! We ran a long update that should have gotten them all, but I may have made an error. I'll check the addresses and re-check the SQL.

I see 4.4 million addresses in the main civi table that include timezone data. Reviewing the export changes now.

Change 320233 had a related patch set uploaded (by Ejegg):
Fix missing timezone / postal code bug

https://gerrit.wikimedia.org/r/320233

That last patch should make sure all the zip code and time zone date we have in the source tables makes it into the export.

However, after de-duping on email, fewer than 85% of the US donors have any data beyond country, so there will be always be a big chunk missing zip code and time zone.

Yeah, 85% about matches what I have. I'm okay with that number.

Thanks!

Change 320233 merged by jenkins-bot:
Fix missing timezone / postal code bug

https://gerrit.wikimedia.org/r/320233

Looks like these are populated now