Page MenuHomePhabricator

Enable audit logging for kube-apiserver
Open, LowPublic

Description

We should evaluate and enable audit logging for the kube-apiserver. This would help to keep track of accidental manipulation of cluster objects or harmful operations.

We have to evaluate which resources and actions we want to log to keep the volume low. I would assume we don't want to log get and watch actions. Furthermore we have to think about how to access the audit log (logstash/elasticsearch or local only?).

More information:
https://kubernetes.io/docs/tasks/debug-application-cluster/audit/

This topic came up in https://phabricator.wikimedia.org/T251305 because with helm3 we may lose some reliability for audit capabilities.

As part of T273507: PodSecurityPolicies will be deprecated with Kubernetes 1.21 audit-logging support has been added to the puppet codebase along with a very simple logging policy that only logs actions modifying pod objects (because that's what was needed in that context). For a more generic approach we could look at the config GCE generates: https://github.com/kubernetes/kubernetes/blob/master/cluster/gce/gci/configure-helper.sh#L1113

Currently the kube-apiservers write audit logs (if enabled) to /var/log/kubernetes/audit.log, rotating after reaching 100MB

Event Timeline

Jelto triaged this task as Low priority.Aug 30 2021, 4:39 PM
JMeybohm renamed this task from Evaluate and enable audit logging for kubeapi-server to Evaluate and enable audit logging for kube-apiserver.Thu, Mar 28, 3:59 PM

Change #1015354 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s/apiserver: Add option to configure audit logging

https://gerrit.wikimedia.org/r/1015354

Change #1015354 merged by JMeybohm:

[operations/puppet@production] k8s/apiserver: Add option to configure audit logging

https://gerrit.wikimedia.org/r/1015354

Change #1016721 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s/apiserver: Fix parameter syntax for --audit-log-maxsize

https://gerrit.wikimedia.org/r/1016721

Change #1016721 merged by JMeybohm:

[operations/puppet@production] k8s/apiserver: Fix parameter syntax for --audit-log-maxsize

https://gerrit.wikimedia.org/r/1016721

Change #1016753 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Enable audit logging in staging-eqiad

https://gerrit.wikimedia.org/r/1016753

Change #1016753 merged by JMeybohm:

[operations/puppet@production] k8s: Enable audit logging in staging-eqiad

https://gerrit.wikimedia.org/r/1016753

Observability-Logging could you maybe advice on if/how/where we could potentially store these audit logs to make them more accessible?
They come as Json lines with the format specified in https://kubernetes.io/docs/reference/config-api/apiserver-audit.v1/#audit-k8s-io-v1-Event

Change #1019049 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] kubernetes::master: Forward audit logs to kafka

https://gerrit.wikimedia.org/r/1019049

Per conversation, the team would like to explore the option of a custom index for k8s audit logs.

The process largely follows the pattern carved by w3creportingapi:

  1. Get these logs into Kafka (patch above)
  2. An index pattern template based on the event schema - example
  3. An OpenSearch output with a specific guard condition - example
  4. A Curator definition enforcing our retention policy - example
  5. An index pattern - to be configured in Dashboards UI when data in the new indexes has appeared in OpenSearch

Change #1019049 merged by JMeybohm:

[operations/puppet@production] kubernetes::master: Forward audit logs to kafka

https://gerrit.wikimedia.org/r/1019049

I've merged the attached patch and the logs are ingested into the logstash-k8s- index (https://logstash.wikimedia.org/app/discover#/view/7f276c90-f8a0-11ee-be54-8fd74c07934f). Unfortunately the event dates are off as the date of ingestion is used instead of a timestamp from the actual data. I suppose this is something that will be fixed by using a dedicated index

Elastic Integrations aren't available to us in an OpenSearch world. However, the mapping data from that link would be useful if we choose to transform these logs to ECS and not use a dedicated index.

Elastic Integrations aren't available to us in an OpenSearch world. However, the mapping data from that link would be useful if we choose to transform these logs to ECS and not use a dedicated index.

Ok, I see.
I can not really gasp how much work it is to go with a dedicated index. The steps you outlined look pretty frightening tbh. - are those things you (as in Observability-Logging) can generate/provide?

JMeybohm renamed this task from Evaluate and enable audit logging for kube-apiserver to Enable audit logging for kube-apiserver.Fri, Apr 12, 7:39 PM

Change #1020186 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/puppet@production] k8s: Enable audit logging for all clusters

https://gerrit.wikimedia.org/r/1020186

Change #1020186 merged by JMeybohm:

[operations/puppet@production] k8s: Enable audit logging for all clusters

https://gerrit.wikimedia.org/r/1020186