Page MenuHomePhabricator

Integrate Event Platform and ECS logs
Open, MediumPublic

Description

T234565: Standardize the logging format is trying to standardize the software logging format with the Elastic Common Schema. If we are able to integrate ECS logs with Event Platform, we can automate ingestion of those log events with all the same tooling.

This would be particularly useful if we are able to successfully migrate the MediaWiki logging format to ECS, as then MediaWiki software logs could be joined with other MediaWiki data in Hive.


I just met with Observability folks in their office hours to discuss this idea. To accomplish this, we'd need:

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

@colewhite, in https://phabricator.wikimedia.org/T288851#7456931 you said:

topics prefixed by rsyslog- will be automatically picked up by Logstash.

We've found using topic naming conventions for ingestion jobs to be brittle. We're moving towards using EventStreamConfig to automate configuring things like this. See: https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#consumers_and_producers

Example:

curl  'https://meta.wikimedia.org/w/api.php?action=streamconfigs&all_settings=1&streams=mediawiki.api-request' |  jq .
{
  "streams": {
    "mediawiki.api-request": {
      "topics": [
        "eqiad.mediawiki.api-request",
        "codfw.mediawiki.api-request"
      ],
      "stream": "mediawiki.api-request",
      "consumers": {
        "analytics_hadoop_ingestion": {
          "enabled": true,
          "job_name": "event_default"
        }
      },
      "canary_events_enabled": true,
      "topic_prefixes": [
        "eqiad.",
        "codfw."
      ],
      "destination_event_service": "eventgate-analytics",
      "schema_title": "mediawiki/api/request"
    }
  }
}

Here, we are declaring a consumer called 'analytics_hadoop_ingestion'. The settings for that consumer are arbitrary and specific to the consumer job. When that job runs, it requests all streams that have consumers.analytics_hadoop_ingestion declared, and uses those settings to import the data.

Logstash ingestion could probably do something similar, if the logging streams to import were declared in EventStreamConfig.