Background
Thursday, 28th July 2022
- https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/817225 was deployed
- During the deployment I confirmed that the mediawiki.web_ui.interactions stream config was being sent to the client and that events were being submitted to the stream by the MediaWiki JS Metrics Platform Client
Friday, 29th July 2022
- I confirmed that the medawiki_web_ui_interactions table was available in Hive
- I confirmed that there were no EventGate validation errors for the mediawiki.web_ui.interactions stream
- I noticed, however, that the custom_data field was not being populated correctly:
select custom_data, count(*) as n from mediawiki_web_ui_interactions where year = 2022 and month = 7 and day = 28 group by custom_data order by n desc limit 10000 ; custom_data n {"data_type":null,"value":null} 339
The Issue
I reached out to @Ottomata and he pointed out that the schema for the custom_data field should be a map type but is currently a schemaless object (a struct). The schema for the field should be like:
custom_data: type: object propertyNames: pattern: ^[$a-z]+[a-z0-9_]*$ # "[P]roperties must be snake_case" minLength: 1 maxLength: 255 additionalProperties: type: object properties: data_type: type: string enum: - number - string - boolean - null value: type: string
This is a type change of a field, which are strongly recommended against. Since there are no downstream consumers, however, we should be able to change the type of the field by:
- Disabling the instrument on testwiki
- Dropping the mediawiki_web_ui_interactions table
- Updating the schema
- Restarting EventGate
- Re-enabling the instrument on testwiki
TODO
- Disable the instrument on testwiki
- Drop the mediawiki_web_ui_interactions table
- Update the schema for the custom_data field
- Restart EventGate
- Re-enable the instrument on testwiki