Page MenuHomePhabricator

WMDEBanner* Event Platform Migration
Open, HighPublic

Event Timeline

Milimetric added a project: Analytics-Kanban.
Milimetric moved this task from Incoming to Event Platform on the Analytics board.

Hi WMDE folks!

I have 1 question regarding the migration of WMDEBanner* schemas:
With the old EventLogging system, the fields client_ip and geocoded_data were collected/prepared by default.
With the new Event Platform system, we only collect/prepare those if needed (privacy by design principle).

Do you need client_ip and/or geocoded_data for WMDEBanner* schemas?

If so, I will mark them to be collected/prepared when I perform the migration, and nothing is going to change.
Otherwise, I will perform a basic migration, and they won't be collected from then on.

Please, let me know what you prefer!
Thanks a lot

@mforns For the New Editors campaign analytics we do not need those two fields. Let's hear from the WMDE FUN team if they do.

Thanks @GoranSMilovanovic! Yes, will wait for their thoughts.

Sorry, long wait for a short answer: No, the Fundraising team does not need the two fields, either.

Change 698798 had a related patch set uploaded (by Mforns; author: Mforns):

[schemas/event/secondary@master] Add wmdebannerevents schema to analytics/legacy

https://gerrit.wikimedia.org/r/698798

Change 698798 merged by jenkins-bot:

[schemas/event/secondary@master] Add wmdebannerevents schema to analytics/legacy

https://gerrit.wikimedia.org/r/698798

Change 698802 had a related patch set uploaded (by Mforns; author: Mforns):

[schemas/event/secondary@master] Add wmdebannerimpressions schema to analytics/legacy

https://gerrit.wikimedia.org/r/698802

Change 698802 merged by jenkins-bot:

[schemas/event/secondary@master] Add wmdebannerimpressions schema to analytics/legacy

https://gerrit.wikimedia.org/r/698802

Change 698804 had a related patch set uploaded (by Mforns; author: Mforns):

[schemas/event/secondary@master] Add wmdebannersizeissue schema to analytics/legacy

https://gerrit.wikimedia.org/r/698804

Change 698804 merged by jenkins-bot:

[schemas/event/secondary@master] Add wmdebannersizeissue schema to analytics/legacy

https://gerrit.wikimedia.org/r/698804

Change 698811 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/mediawiki-config@master] Migrate WMDEBanner* schemas to EventPlatform on testwiki

https://gerrit.wikimedia.org/r/698811

Change 698811 merged by Ottomata:

[operations/mediawiki-config@master] Migrate WMDEBanner* schemas to EventPlatform on testwiki

https://gerrit.wikimedia.org/r/698811

Mentioned in SAL (#wikimedia-operations) [2021-06-09T13:59:33Z] <otto@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on testwiki - T282562 (duration: 01m 08s)

Change 698996 had a related patch set uploaded (by Mforns; author: Mforns):

[operations/mediawiki-config@master] Migrate WMDEBanner* schemas to EventPlatform on all wikis

https://gerrit.wikimedia.org/r/698996

Change 698996 merged by Ottomata:

[operations/mediawiki-config@master] Migrate WMDEBanner* schemas to EventPlatform on all wikis

https://gerrit.wikimedia.org/r/698996

Mentioned in SAL (#wikimedia-operations) [2021-06-09T14:37:00Z] <otto@deploy1002> Synchronized wmf-config/InitialiseSettings.php: Migrate WMDEBanner* schemas to EventPlatform on all wikis - T282562 (duration: 01m 06s)

Change 699002 had a related patch set uploaded (by Ottomata; author: Ottomata):

[operations/puppet@production] Finalize backend EP migration of 4 EL schemas

https://gerrit.wikimedia.org/r/699002

Mentioned in SAL (#wikimedia-analytics) [2021-06-10T14:07:20Z] <ottomata> altered event.wmdebannerevent event.eventRate field to change type from BIGINT to DOUBLE - T282562

FYI, WMDEBannerEvent had a badly interpreted field type in Hive from years ago. Its event.eventRate had been interpreted as an integer instead of a double (from a time LONG ago when we weren't able to use the JSONSchemas to import this data into Hive). As such, any data for this field that was coming in were being cast to an integer.

We just noticed this as we migrated this schema to Event Platform and started emitting example canary events with decimal values for this field, which caused an alert to fire when ingesting the data into Hive.

I fixed this by manually altering the Hive table to change event.eventRate to a double.

alter table `event.wmdebannerevents` change `event` `event` STRUCT<`bannerAction`: STRING, `bannerName`: STRING, `eventRate`: DOUBLE, `finalSlide`: BIGINT, `slidesShown`: BIGINT>;

Ingesting is succeeding now.

Huh, I'm pretty sure that we saw non-integer values in there. Anyway, probably that also applies to the two other events.

Change 699002 merged by Ottomata:

[operations/puppet@production] Finalize backend EP migration of 4 EL schemas

https://gerrit.wikimedia.org/r/699002