There are complicated issue when we try alter Hive & Iceberg tables that have structs as array or map value element types.
Complicated context in:
- T209453: Refine: Use Spark SQL instead of Hive JDBC
- T259924: HiveExtensions.convertToSchema does not properly convert arrays of structs
- T307040: Propagate field descriptions from event schemas to Hive event tables
Currently jsonschema-tools will allow for backwards compatible changes to array and map element struct types.
We should make a configurable change to jsonschema-tools that will fail if a user tries to make a change to a struct element type.
Without this change, it is possible for Event Platform users to make schema changes that will cause our event ingestion pipeline (Refine) to fail.
Done is
- Update Event_Platform/Schemas/Guidelines to recommend not using structs inside of maps or arrays, and also explain that if they do use them, they cannot change them later in a backwards compatible way: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Complex_array_element_and_map_value_evolution_is_not_well_supported
- jsonschema-tools compatibility test configurably fails if an array element or map value element struct is changed between non-major versions.
- event schema respositories use new jsonschema-tools version, and configure this failure to happen during their CI checks.