We should move special casing and transformation of EventLogging analytics data for insertion into MySQL into the MySQL consumer process itself, not upstream in the processor.
Currently, we do several things to make EventLogging analytics data work for MySQL.
- Convert (varnish) timestamps to ints and then to to Mediawiki format. T179540
- Parse `userAgent` and convert to JSON string. T153207, T178440
- Filter out unwanted bots. T67508
We should do these things only to the data as it is inserted into MySQL, not before it goes to Kafka.
- Modify EventCapsule schema
-- make `timestamp` optional (and revert it to `utc-millisec` format).
-- add optional `dt` field in ISO-8601 format.
- Modify eventlogging code to
-- Parse `dt` from raw client-side log format.
-- Parse userAgent, but leave it as a nested object, not a JSON string.
-- Support map/filter conversion during in MySQL consumer reader in order to:
--- filter out bots
--- convert userAgent to JSON string
--- Replace `dt` field with integer `timestamp` field. (jrm.py db code will continue to convert this to Mediawiki format)