We should move special casing and transformation of EventLogging analytics data for insertion into MySQL into the MySQL consumer process itself, not upstream in the processor.
Currently, we do several things to make EventLogging analytics data work for MySQL.
- Convert (varnish) timestamps to ints and then to to Mediawiki format. T179540
- Parse `userAgent` and convert to JSON string. T153207, T178440
- Filter out unwanted bots. T67508
We should do these things only to the data as it is inserted into MySQL, not before it goes to Kafka.
- Modify EventCapsule schema
-- Make `timestamp` optional `number`
-- Add optional `dt` field in ISO-8601 format.
-- Make `userAgent` `"type": ["object", "string"]` rather than just `"type": "string"`
- Modify eventlogging code to
-- Parse `dt` from raw client-side log format.
-- Parse userAgent, but leave it as a nested object, not a JSON string.
-- Add map&filter reader/writer handlers to
-- map:// to add `timestamp` during eventlogging-processor
--- map:// in MySQL consumer reader process in order to:
--- Filter out bots
--- Convert userAgent to JSON string
-- Add `dt` to list of `NO_DB_PROPERTIES` in jrm.py