Page MenuHomePhabricator

Stream config: Refine value-based curation config syntax
Closed, ResolvedPublic

Description

Our proposed stream configuration syntax for configuring value-based sampling looks like this:

Supported filters include:

  • ['==', x]
  • ['!=', x]
  • ['<', x]
  • ['>', x]
  • ['<=', x]
  • ['>=', x]
  • ['in', [x, y, z]]
  • ['not_in', [x, y, z]]
  • ['contains', x]
  • ['contains_all', [x, y, z]]
  • ['contains_any', [x, y, z]]

Example config:

"very.cool.stream": {
  "producer": {
    "metrics_platform_client": {
      [...]
      "filter": {
        // Only events matching these filters get sent
        "user_is_logged_in": [ "==", true ],
        "mediawiki_skin"   : [ "in", [ "Vector", "MinervaNeue" ] ]
      }
    }
  }
}

This 1x2 array format with arbitrary member types is fine for a dynamically typed language like JavaScript but is more cumbersome for more strongly typed languages like Java and Swift. This task is to decide on a modified syntax that is friendlier to those languages.

Event Timeline

Mholloway moved this task from Backlog to Work in Progress on the Metrics-Platform board.
Mholloway moved this task from Inbox to Doing on the Product-Data-Infrastructure board.

I believe these minor adjustments would make life much easier in Java and Swift.

BeforeAfter
['==', x]equals: x
['!=', x]not_equals: x
['<', x]less_than: x
['>', x]greater_than: x
['<=', x]less_than_or_equals: x
['>=', x]greater_than_or_equals: x
['in', [x, y, z]]in: [x, y, z]
['not_in', [x, y, z]]not_in: [x, y, z]
['contains', x]contains: x
['contains_all', [x, y, z]]contains_all: [x, y, z]
['contains_any', [x, y, z]]contains_any: [x, y, z]

Modeling equals, not_equals, etc., as properties of the value-based sampling configuration object will allow us to model them as properties of the corresponding POJO in the Java client, and to do similarly in Swift by way of our prospective usage of the SerializedSwift library.

I'll code this up in Java to aid discussion.

Mholloway renamed this task from Stream config: Refine value-based sampling syntax to Stream config: Refine value-based curation config syntax.Aug 3 2021, 3:16 PM

WIP patch coming soon. The config structure I'm zeroing in on is something like:

"very.cool.stream": {
 "producer": {
   "metrics_platform_client": {
     [...]
     "curation": [
       {
         "property": "user_is_logged_in",
         "rules": [
           { "equals": true }
         ]
       },
       {
         "property": "mediawiki_skin",
         "rules": [
           { "in": [ "Vector", "MinervaNeue" ] }
         ]
       }
    ]
  }
}

This config structure should be at least as easy to handle in JS as the current structure, but provides predictable field names for the config object in languages like Java, eliminating the need for introspection.

Change 709820 had a related patch set uploaded (by Mholloway; author: Michael Holloway):

[mediawiki/libs/metrics-platform@master] [WIP] [Java] Support curating stream data sets based on event properties

https://gerrit.wikimedia.org/r/709820

@jlinehan Not all of the tests are written up yet, and stream config deserialization from JSON specifically needs testing, but that should be enough to illustrate the approach.

Patch is ready for review. I'll introduce tests for JSON serialization and deserialization separately.

Change 716057 had a related patch set uploaded (by Mholloway; author: Michael Holloway):

[mediawiki/libs/metrics-platform@master] [VERY WIP] [Swift] Support curating stream data sets based on event properties

https://gerrit.wikimedia.org/r/716057

Change 709820 merged by jenkins-bot:

[mediawiki/libs/metrics-platform@master] [Java] Support curating stream data sets based on event properties

https://gerrit.wikimedia.org/r/709820

Change 716057 merged by jenkins-bot:

[mediawiki/libs/metrics-platform@master] [Swift] Support curating stream data sets based on event properties

https://gerrit.wikimedia.org/r/716057

Calling this resolved as we just merged a lot of code relying on the new config syntax.