We should move special casing and transformation of EventLogging analytics data for insertion into MySQL into the MySQL consumer process itself, not upstream in the processor.
Currently, we do several things to make EventLogging analytics data work for MySQL.
- Convert (varnish) timestamps to ints and then to to Mediawiki format. T179540
- Parse userAgent and convert to JSON string. T153207, T178440
- Filter out unwanted bots. T67508
We should do these things only to the data as it is inserted into MySQL, not before it goes to Kafka.
I propose:
- Modify EventCapsule schema
- Make timestamp optional number
- Add optional dt field in ISO-8601 date-time format.
- Make userAgent "type": ["object", "string"] rather than just "type": "string"
- Modify eventlogging code to
- Parse dt from raw client-side log format.
- Parse userAgent, but leave it as a nested object, not a JSON string.
- Add map:// reader/writer handlers to
- map:// in eventlogging-consumer mysql to add timestamp
- add timestamp and remove dt for compatibility with existing tables
- Filter out bots
- Convert userAgent to JSON string for compatibility with existing tables