Page MenuHomePhabricator

The api-feature-usage log channel should use log context instead of parsing a string
Closed, ResolvedPublic

Description

Currently we log a formatted string from ApiBase, and then parse it in logstash's config. I suspect it'd be more straightforward to have ApiBase use the log context to send that data in a structured format to logstash.

  • 1. ??? (Any preliminary changes needed?)
  • 2. Have ApiBase start including log context with the message.
  • 3. ??? (Something in the logstash config?)

Event Timeline

@bd808: Any idea what goes in the "???" in the checklist? I ask you because you made the mistake of doing the logging a few years ago ;)

Maybe something like this?

  1. Have ApiBase start including log context with the message
    1. feature
    2. username (url decoded)
    3. ip
    4. referer
    5. agent
  2. Update the if [channel] == "api-feature-usage" {...} config in filter-mediawiki.conf to use the structured data from the context rather than grok-parsing the human readable message string
  1. Update the if [channel] == "api-feature-usage" {...} config in filter-mediawiki.conf to use the structured data from the context rather than grok-parsing the human readable message string

Is there an example of some other channel already doing that that I can look at?

Is there an example of some other channel already doing that that I can look at?

I'm not seeing an obvious analog channel. I think a diff similar to this might be the right approach:

modules/profile/files/logstash/filter-mediawiki.conf.diff
diff --git i/modules/profile/files/logstash/filter-mediawiki.conf w/modules/profile/files/logstash/filter-mediawiki.conf
index 31b3e703c0..9bdaaa340f 100644
--- i/modules/profile/files/logstash/filter-mediawiki.conf
+++ w/modules/profile/files/logstash/filter-mediawiki.conf
@@ -60,45 +60,9 @@ filter {
     } # end [channel] == "exception-json"

     if [channel] == "api-feature-usage" {
-      grok {
-        match => [
-          "message",
-          "^(?m)%{QS:feature} %{QS:username} %{QS:ip} %{QS:referer} %{QS:agent}$"
-        ]
-        named_captures_only => true
-      }
-
-      if !("_grokparsefailure" in [tags]) {
-        # Unquote ('"foo \"bar\""' to 'foo "bar"')
-        mutate {
-          # Strip outer quotes
-          gsub => [
-              "feature",  '^"|"$', "",
-              "username", '^"|"$', "",
-              "ip",       '^"|"$', "",
-              "referer",  '^"|"$', "",
-              "agent",    '^"|"$', ""
-          ]
-        }
-        mutate {
-          # Strip backslash escape characters
-          gsub => [
-              "feature",  '\\(.)', '\1',
-              "username", '\\(.)', '\1',
-              "ip",       '\\(.)', '\1',
-              "referer",  '\\(.)', '\1',
-              "agent",    '\\(.)', '\1'
-          ]
-        }
-
         mutate {
           replace => [ "message", "%{feature}" ]
         }
-
-        urldecode {
-          field => "username"
-        }
-
         useragent {
           source => "agent"
           prefix => "ua_"

This diff is removing all of the raw message parsing and post parsing string manipulations. It assumes that the incoming structured log event includes the feature, username, ip, referer, and agent fields that were previously being extracted from the human readable message field.

I was hoping it would be that simple, that the variables set in the context would just automatically be present without any magic on the logstash end. Thanks!

Change 493322 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] API: Use log context for api-feature-usage log

https://gerrit.wikimedia.org/r/493322

Change 493323 had a related patch set uploaded (by Anomie; owner: Anomie):
[operations/puppet@production] Logstash: Use log context for the api-feature-usage channel

https://gerrit.wikimedia.org/r/493323

Change 493322 merged by jenkins-bot:
[mediawiki/core@master] API: Use log context for api-feature-usage log

https://gerrit.wikimedia.org/r/493322

@Anomie Who do we need to sync up with to get review for this? https://gerrit.wikimedia.org/r/493323 has a +1 but who can +2 it?

fgiunchedi subscribed.

@Anomie Who do we need to sync up with to get review for this? https://gerrit.wikimedia.org/r/493323 has a +1 but who can +2 it?

That'd be me for example (one of the folks working on Wikimedia-Logstash / observability) and the review completely fell off the radar! I'll bring it up today at the observability meeting.

Change 493323 had a related patch set uploaded (by Filippo Giunchedi; owner: Anomie):
[operations/puppet@production] Logstash: Use log context for the api-feature-usage channel

https://gerrit.wikimedia.org/r/493323

Change 493323 merged by Filippo Giunchedi:
[operations/puppet@production] Logstash: Use log context for the api-feature-usage channel

https://gerrit.wikimedia.org/r/493323

Anomie claimed this task.