Page MenuHomePhabricator

Create logstash dashboard(s) for Lift Wing
Closed, ResolvedPublic

Description

The kubernetes container logs are shipped automatically by logstash, for example:

https://logstash.wikimedia.org/app/discover#/doc/logstash-*/logstash-k8s-1-7.0.0-1-2022.09.06?id=Jz3fEoMBzGlbejpUpSv1

We can filter for kubernetes.container_name: kserve-container but we are not logging much at the moment related to the HTTP request:

[I 220906 12:56:28 web:2243] 200 POST /v1/models/enwiki-articlequality:predict (127.0.0.1) 155.53ms

We should review logging and use a new/better/more-informative format in Kserve. We should also figure out if we need more dashboards, for example for knative/istio/etc..

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
Resolvedelukey

Event Timeline

Secondary goal - based on https://kserve.github.io/website/developer/debug/#debug-kserve-request-flow we should be able to inspect logs at various levels, when in need (like a bug or unexpected behavior reported).

Very nice commit: https://github.com/kserve/kserve/commit/ff7014b0c1a79672978d5b0a23af6c5ae1158b3b

It adds prometheus metrics related to latency of various steps, like preprocess/process/etc..

Changing the log format of tornado seems a little hard, since we should probably create an ad-hoc handler in kserve etc..

I think that an Istio Gateway dashboard is enough, see https://logstash.wikimedia.org/app/dashboards#/view/138271f0-40ce-11ed-bb3e-0bc9ce387d88 (shared with Service Ops).

Created https://logstash.wikimedia.org/app/dashboards#/view/fa21f5e0-42ef-11ed-ae81-bb78ac0690d3 for KServe access logs. It is easy to select/filter a cluster and a single Inference Service. We can improve/add widgets as we go, but for the moment it seems sufficient.

Created also https://logstash.wikimedia.org/app/dashboards#/view/fedf64a0-42f4-11ed-ae81-bb78ac0690d3 for Knative Serving. Again very basic but it should do it for now, we can always add more widgets in the future.

elukey claimed this task.