In https://phabricator.wikimedia.org/T248987#6545103, @Krinkle noted that using nested objects in JSON data that gets imported into logstash isn't great. Best practice for logstash is to use flat data structures. The reasoning behind this is to avoid type conflicts e.g. where one datum might have a field as a string, and another a field with the same name as a nested object. If everything is flat, type conflicts are rarer.
But, logstash does work fine with nested types, as long as there are no name conflicts in the same ElasticSearch index. Since events are strictly schemaed, it is sometimes difficult to add new concrete fields to schemas when all that is desired is to capture some arbitrary context data. In Event Platform schemas, we use 'map types' for this. In JSON, map types look just like JSON objects, but in JSONSchema, we can differentiate between a regular nested object that could have any values with any type, and an object that has all values with a specific type.
mediawiki/client/error events are ingested into logstash. Currently, these events have a tags field, which IIUC conflicts with tags as added by logstash itself. We're ok with renaming this field to avoid the conflict, but we'd like to continue using nested map types in this data (so we don't have to add new context specific top level fields to a generic event schema).
If event data that is ingested into logstash had its own index, we could more safely reason about the types of field names and avoid type conflicts.
Can we add a new ElasticSearch index for schemaed event data?