Page MenuHomePhabricator

Decide how to do DAG logging on dse-k8s
Closed, ResolvedPublic

Description

As an Airflow user, I need the ability to look back at my DAG logs (example screenshot) . In our current Airflow deployment, these logs are saved to /srv/airflow-${INSTANCE}/logs/ . We won't be able to use the same approach in Kubernetes.

There are a number of ways to tackle this. For example, we could use:

  • Persistent volume claims
  • Writing to Elasticsearch
  • Adapt other approaches currently in use for other WMF Kubernetes applications.

Creating this ticket to:

  • Gather requirements from stakeholders.
  • Decide on our approach.

We can start a separate ticket for implementation once that is decided.

Event Timeline

This is an extremely pertinent question. Thanks @bking for creating the ticket. It's got me thinking.

There are also some useful docs here: https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/logging-tasks.html

I'm going to assign it to myself, while I do a bit of research, if that's OK.

I've been working with this file (GDocs) Airflow on Kubernetes - Test and migration plan and I have made a recommendation in that document, regarding logging.

Having reviewed Logging and Monitoring architecture — Airflow Documentation I suggest that we look to enable both an Elasticsearch based backend, and an S3 based central file store.

For the S3 based system we can use our Ceph cluster, as it has advantages for this use case over both the core swift mediaservers and the new APUs Ceph cluster, managed by Data Persistence.

For the elasticsearch hosting, we could either:

  • Run a small but dedicated elasticsearch/opensearch under kubernetes
  • Use our new multualised (and virtualised) opensearch cluster
  • Request that we use an existing opensearch cluster maintained by Observability

I would be tempted to go with option 1 to engine with, at least during this period of testing.

Getting the S3 based logging to work will require us to complete these two tasks:

We can already start work on the elasticsearch/opensearch part of it now.

For the elasticsearch hosting

@BTullis why not just used existing logstash infra? Or am I misunderstanding something?

For the elasticsearch hosting

@BTullis why not just used existing logstash infra? Or am I misunderstanding something?

Yes, we could do. It's an option discussed in the doc.

I just thought that for this period of early testing it would be as easy to spin up an empty, disposable instance.If it looks like the logstash infra will be better long term for the production instances, then we can use that.

I hadn't looked into whether we can create our own arbitrary indices on the logstash infa, but I can do this too.

I hadn't looked into whether we can create our own arbitrary indices on the logstash infa, but I can do this too.

Hm! Do you need this? Would the ECS index be sufficient?

I hadn't looked into whether we can create our own arbitrary indices on the logstash infa, but I can do this too.

Hm! Do you need this? Would the ECS index be sufficient?

I don't know yet, but I think we should be able to find out with a bit of experimentation.
I've been looking at the configuration options described here: https://airflow.apache.org/docs/apache-airflow-providers-elasticsearch/stable/logging/index.html

They only seems to mention a host:port for elasticsearch, then there is an option to enable JSON logging for tasks. I'm just not sure how easy it would be to integrate the elasticsearch logging provider with our kafka/rsyslog based logstash pipeline.
Anyway, I'll keep an open mind to it and see what we discover during testing.

I've created two sub-tasks to track the implementation of both types of logging system.
We need to finish work on getting radogw working before we can start writing S3 logs, but the elasticsearch logging we coud start now.

I'll mark this ticket as resolved. because the initial decision-making part has been completed. We can work on implementation separately.

I have decided to ask a question on the Airfow Slack about whether or not there is any benefit to running both S3 and Elasticsearch based logs.

image.png (153×1 px, 29 KB)

I have received one response so far to my question, which is from an Airflow contributor.

image.png (185×843 px, 33 KB)

So there may be no use case for implementing both S3 and Elastic based logging, after all.