We're trying to use the EventLogging schemas to create Hive tables. In order to do this, the latest schema revisions must be backwards compatible with previous ones. Backwards compatibility basically means that you are only ever allowed to add optional fields. I just updated https://wikitech.wikimedia.org/wiki/Analytics/Systems/EventLogging/Schema_Guidelines to make this restriction more visible.
I've ran into several cases where people have added required fields to schemas. This is preventing the Hive importer from using the latest schema to read old data.
If there are any producers of event data that still use the old schemas, we won't be able to import this data into Hive. To do so, we need to make sure that the latest schema for each of these is compatible with all schema revisions that are currently being used to produce events. This means that we'll need to re-add removed fields, and change any added required fields to non-required.
(This list might not be comprehensive. I just looked at the recent history of schemas that I encountered errors with and found these changes.)
- MediaViewer - Added required variant
Renaming fields is not allowed. You are removing required fields, and then adding new required fields. This is not backwards compatible!
- TestSearchSatisfaction2 - added required isForced
- MobileWikiAppEdit - added required source
- MobileWikiAppSessions - added required is_anon, event_dt, etc.
added required fields, ts, appInstallId, synced etc.:
- MobileWikiAppLangSelect -
- MobileWikiAppToCInteraction - Ah! The most egregious offense! Type changes are definitely not allowed! :p
- TestSearchSatisfaction2 - added required uniqueId