Page MenuHomePhabricator

Mutate mmkubernetes k8s fields into ECS fields
Open, HighPublic

Description

Currently, rsyslog's mmkubernetes data is in the kubernetes top-level key. For ECS messages, this top-level key is invalid and is dropped.

Create a logstash filter that moves the mmkubernetes data into the appropriate ECS fields. Define fields in the ECS schema as needed.

Event Timeline

I think the general schema created by mmkubernetes is actually a valid starting point for expanding the schema. What is currently offered by ECS for the container and orchestrator namespaces doesn't cover our needs.

Reading the mmkubernetes documentation, here is the list of fields, the corresponding ECS equivalent, and in case of need, my proposed extension to the default orchestrator ECS schema.

mmkubernetesecs equivalent
kubernetes.namespace_nameorchestrator.namespace
kubernetes.pod_nameorchestrator.resource.name
kubernetes.container_namecontainer.name
docker.idcontainer.id
kubernetes.master_urlorchestrator.cluster.url
kubernetes.namespace_id-
kubernetes.pod_idorchestrator.resource.id
kubernetes.creation_timestamp-
kubernetes.hosthost.hostname
kubernetes.labelsorchestrator.metadata.labels
kubernetes.annotationsorchestrator.metadata.annotations
kubernetes.namespace_labelsorchestrator.metadata.namespace_labels
kubernetes.namespace_annotationsorchestrator.metadata.namespace_annotations

Substantially, I'm proposing to add a metadata extension that would include 4 objects:

  • labels
  • annotations
  • namespace_labels
  • namespace_annotations

And to add a orchestrator.resource.id field, as suggested by Janis.

I would second your proposal apart from what I think is a typo. kubernetes.namespace_name should be orchestrator.namespace. Also we might want to think about carrying kubernetes.pod_id around. With our current workload the field is not of much use (as kubernetes.pod_name is unique as well). But there are workloads that generate non-unique pod names (StatefulSet) where we would be unable to distinguish different incarnations without the ID.

I would second your proposal apart from what I think is a typo. kubernetes.namespace_name should be orchestrator.namespace. Also we might want to think about carrying kubernetes.pod_id around. With our current workload the field is not of much use (as kubernetes.pod_name is unique as well). But there are workloads that generate non-unique pod names (StatefulSet) where we would be unable to distinguish different incarnations without the ID.

yes makes sense; I've interated your suggestions/corrections.

yes makes sense; I've interated your suggestions/corrections.

Unfortunately orchestrator.resource.id is not defined in https://www.elastic.co/guide/en/ecs/current/ecs-orchestrator.html so we would need to add that to orchestrator.metadata I guess.

yes makes sense; I've interated your suggestions/corrections.

Unfortunately orchestrator.resource.id is not defined in https://www.elastic.co/guide/en/ecs/current/ecs-orchestrator.html so we would need to add that to orchestrator.metadata I guess.

Yeah I just forgot the italics.

Joe triaged this task as High priority.Oct 28 2021, 1:38 PM

Setting priority to "high" as this would be a blocker for T288851 given we've adopted an ECS schema there.

Change 930597 had a related patch set uploaded (by Cwhite; author: Cwhite):

[operations/software/ecs@master] backport orchestrator fields from ECS 8.8

https://gerrit.wikimedia.org/r/930597