Page MenuHomePhabricator

Resolve EventCapsule / MySQL / Hive schema discrepancies
Closed, ResolvedPublic13 Story Points

Description

We should move special casing and transformation of EventLogging analytics data for insertion into MySQL into the MySQL consumer process itself, not upstream in the processor.

Currently, we do several things to make EventLogging analytics data work for MySQL.

  • Convert (varnish) timestamps to ints and then to to Mediawiki format. T179540
  • Parse userAgent and convert to JSON string. T153207, T178440
  • Filter out unwanted bots. T67508

We should do these things only to the data as it is inserted into MySQL, not before it goes to Kafka.

I propose:

  • Modify EventCapsule schema
    • Make timestamp optional number
    • Add optional dt field in ISO-8601 date-time format.
    • Make userAgent "type": ["object", "string"] rather than just "type": "string"
  • Modify eventlogging code to
    • Parse dt from raw client-side log format.
    • Parse userAgent, but leave it as a nested object, not a JSON string.
    • Add map:// reader/writer handlers to
    • map:// in eventlogging-consumer mysql to add timestamp
      • add timestamp and remove dt for compatibility with existing tables
      • Filter out bots
      • Convert userAgent to JSON string for compatibility with existing tables

Event Timeline

Ottomata created this task.Nov 2 2017, 10:34 PM

Change 388255 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventlogging@master] Resolve EventCapsule / MySQL schema descrepencies

https://gerrit.wikimedia.org/r/388255

Ottomata renamed this task from Resolve EventCapsule / MySQL schema descrepencies to Resolve EventCapsule / MySQL / Hive schema discrepancies.
Ottomata updated the task description. (Show Details)Nov 2 2017, 10:44 PM
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: Nuria, mforns.
Ottomata edited projects, added Analytics-Kanban; removed Analytics.Nov 3 2017, 2:12 PM
Ottomata moved this task from Next Up to In Progress on the Analytics-Kanban board.
Ottomata set the point value for this task to 8.
Ottomata changed the point value for this task from 8 to 13.
Ottomata updated the task description. (Show Details)Nov 3 2017, 2:36 PM
Ottomata updated the task description. (Show Details)Nov 3 2017, 4:02 PM
Ottomata updated the task description. (Show Details)Nov 6 2017, 3:47 PM

Change 389713 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Make navtiming support nested parsed UA objects, as well as json strings

https://gerrit.wikimedia.org/r/389713

@Krinkle, I just submitted https://gerrit.wikimedia.org/r/#/c/389713/, let me know what you think.

Also, do you use the timestamp field in any of your consumers? If not, we may get rid of it in favor of dt.

Change 389722 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] [WIP] EventLogging analytics capsule discrepency fixes

https://gerrit.wikimedia.org/r/389722

Change 389713 merged by Ottomata:
[operations/puppet@production] webperf: Make navtiming support nested parsed UA objects as well

https://gerrit.wikimedia.org/r/389713

Ottomata updated the task description. (Show Details)Nov 7 2017, 9:03 PM
Ottomata updated the task description. (Show Details)

Change 389861 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Add exception guard for json parsing in eventlogging mysql filter

https://gerrit.wikimedia.org/r/389861

Krinkle removed a subscriber: Krinkle.Nov 8 2017, 8:36 AM

Change 388255 merged by Ottomata:
[eventlogging@master] Resolve EventCapsule / MySQL schema discrepancies

https://gerrit.wikimedia.org/r/388255

Mentioned in SAL (#wikimedia-operations) [2017-11-08T15:16:59Z] <otto@tin> Started deploy [eventlogging/analytics@02c5a6b]: EventCapsule update and fixes, this is no-op as is. T179625

Mentioned in SAL (#wikimedia-operations) [2017-11-08T15:17:07Z] <otto@tin> Finished deploy [eventlogging/analytics@02c5a6b]: EventCapsule update and fixes, this is no-op as is. T179625 (duration: 00m 04s)

Change 389861 merged by Ottomata:
[operations/puppet@production] Add exception guard for json parsing in eventlogging mysql filter

https://gerrit.wikimedia.org/r/389861

Change 391015 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventlogging@master] Update EventCapsule revision witih property-less userAgent

https://gerrit.wikimedia.org/r/391015

Change 391015 merged by Ottomata:
[eventlogging@master] Update EventCapsule revision witih property-less userAgent

https://gerrit.wikimedia.org/r/391015

Mentioned in SAL (#wikimedia-operations) [2017-11-13T15:06:05Z] <otto@tin> Started deploy [eventlogging/analytics@5796c27]: T179625

Mentioned in SAL (#wikimedia-operations) [2017-11-13T15:06:12Z] <otto@tin> Finished deploy [eventlogging/analytics@5796c27]: T179625 (duration: 00m 04s)

Change 391019 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventlogging@master] Parse userAgent if it is a (str, unicode)

https://gerrit.wikimedia.org/r/391019

Change 391019 merged by Ottomata:
[eventlogging@master] Parse userAgent if it is a string type

https://gerrit.wikimedia.org/r/391019

Mentioned in SAL (#wikimedia-operations) [2017-11-13T15:22:47Z] <otto@tin> Started deploy [eventlogging/analytics@e024af3]: T179625

Mentioned in SAL (#wikimedia-operations) [2017-11-13T15:22:53Z] <otto@tin> Finished deploy [eventlogging/analytics@e024af3]: T179625 (duration: 00m 02s)

Ottomata added a comment.EditedNov 13 2017, 3:43 PM

Alright, no-op EL code changes are running fine in production. Here's the steps to apply these changes, with respect to currently in use tbayer.popups stuff.

  • stop eventlogging Camus and refine jobs
  • Delete all raw and refined eventlogging HDFS data and Hive tables
  • convert already refined tbayer.popups data into event.popups table using something like https://gist.github.com/ottomata/f73461f7bca0e5da9368f5480b7fbe7b
  • deploy https://gerrit.wikimedia.org/r/#/c/389722/, restart EL, make sure topic and MySQL data looks good.
  • restart camus and refine jobs
  • tell tbayer to use event.popups with object userAgent and string dt fields instead of tbayer.popups
  • delete eventlogging_refine_test class from puppet.

Change 391023 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Temporarily disable EventLogging refine jobs

https://gerrit.wikimedia.org/r/391023

@Tbayer I'd really like to move on the above plan asap, seeing as I will be out the ends of the next two weeks, and I'd much rather deploy near the beginning of a week.

This change is particularly relevant to us because it involves the experimental tbayer.popups table. I'd like to start fully productionizing (including wikitech docs and an announcement). When I deploy this change, you'll need to use event.popups instead of tbayer.popups. The timestamp field will be removed in favor of dt, and userAgent will be a nested object instead of a JSON string, so you can use fields like useragent.is_bot, etc. in your queries.

I'm not going to remove tbayer.popups, but new data will no longer be inserted into it. Any objections to me doing this today or tomorrow?

Could this be held off two more days, when the data collection for this one ends (T178500)? Having to join two tables with incompatible formats is likely to add a lot of unnecessary complexity to the analysis.

Ah great! Didn't realize it was ending so soon. That's fine, we can wait.

Having to join two tables with incompatible formats is likely to add a lot of unnecessary complexity to the analysis.

You wouldn't have to join two tables; everything that is currently in tbayer.popups will also be in event.popups.

But even so, I'm fine with waiting until next week if that makes the experiment easier.

Thanks! This has stopped now (T178500), so feel free to go ahead.

Mentioned in SAL (#wikimedia-analytics) [2017-11-20T15:45:15Z] <ottomata> deploying fixes to EL EventCapsule discrepancies: https://phabricator.wikimedia.org/T179625#3755242

Change 391023 merged by Ottomata:
[operations/puppet@production] Temporarily disable EventLogging refine jobs

https://gerrit.wikimedia.org/r/391023

Change 389722 merged by Ottomata:
[operations/puppet@production] EventLogging analytics capsule discrepency fixes

https://gerrit.wikimedia.org/r/389722

Change 392468 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Fix coal to use EventCapsule dt instead of timestamp

https://gerrit.wikimedia.org/r/392468

Mentioned in SAL (#wikimedia-operations) [2017-11-20T19:47:05Z] <ottomata> restarted coal with fixes for eventcapsule changes in T179625

Change 392468 merged by Ottomata:
[operations/puppet@production] Fix coal to use EventCapsule dt instead of timestamp

https://gerrit.wikimedia.org/r/392468

Change 392473 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[operations/puppet@production] Remove eventlogging_refine_test temporary class

https://gerrit.wikimedia.org/r/392473

Change 392473 merged by Ottomata:
[operations/puppet@production] Remove eventlogging_refine_test temporary class

https://gerrit.wikimedia.org/r/392473

ALllriiight! Capsule changes all deployed and good. Json Refine jobs restarted and moving along nicely.

Documentation about how to access in Hive/Spark updated here:

https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging#Hadoop_.26_Hive

This comment was removed by Ottomata.

Change 392898 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventlogging@master] Don't dump userAgent to json string in parse.py

https://gerrit.wikimedia.org/r/392898

Change 392898 merged by Ottomata:
[eventlogging@master] Don't dump userAgent to json string in parse.py

https://gerrit.wikimedia.org/r/392898

Mentioned in SAL (#wikimedia-operations) [2017-11-22T20:13:13Z] <otto@tin> Started deploy [eventlogging/analytics@57234e7]: no-op: removing now unneeded code that might accidentally serialize userAgent to json string: T179625

Mentioned in SAL (#wikimedia-operations) [2017-11-22T20:13:21Z] <otto@tin> Finished deploy [eventlogging/analytics@57234e7]: no-op: removing now unneeded code that might accidentally serialize userAgent to json string: T179625 (duration: 00m 04s)

Nuria closed this task as Resolved.Nov 28 2017, 6:22 PM