When we designed the revision-score schema, we made the scores field be an array of objects. However, the objects here are too variable, as the individual model scores can create many different types of fields. In Hive, all nested fields of the same name will have their subfields merged together in one huge struct field. Example.
The following two events are not compatible
{"scores": [{"model": "good_or_bad", "prediction": {"bad": true}}]}
{"scores": [{"model": "badness_level", prediction": {"bad": 0.50 }}]}
In Hive, the bad field for both of these events is the same field (scores[0].prediction.bad), but it has different types.
To avoid this, can we change the schema so that scores is an object keyed by model name instead of an array? I'm not sure if we would need to key by model_version as well...does the schema of the score object change in any incompatible ways between versions?
I'll make a patch to consider...