Page MenuHomePhabricator

EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set header values in event data
Open, Needs TriagePublic

Description

Currently, eventgate-wikimedia has custom code to set a few defaults in event data from HTTP request headers. Instead of manually specifying the headers to set, we could add properties explicitly to the schema that should be set by eventgate if they are not already in the event data. Example:

http:
  type: object
  description: Information about the HTTP request that generated an event.
  properties:
    request_headers:
      type: object
      description: Request headers sent by the client.
      additionalProperties:
        type: string
      properties:
        user-agent:
          type: string
        referer:
          type: string
        x-client-ip:
          type: string

Here, request_headers is still a 'map type', as it has additionalProperties with a specific type. However, some specific keys are defined. eventgate-wikimedia could use this fact to automatically set the values to the corresponding HTTP header. Something like

for property in http.request_headers.properties {
  if (req.headers[property.to_lower_case()] && !event.http.request_headers[property]) {
    event.http.request_headers[property] = req.headers[property.to_lower_case()]
  }
}

This would allow us to get rid of header specific custom code in eventgate-wikimedia, while giving control over what headers are automatically set to schema owners, without forcing them to have their producer code set the header in the event data (and ultimately send the same data twice over the same HTTP request)).

We'd need to make sure any JSONSchema converter code knows to still interpret objects with additionalProperties and type: string as a map even if properties are also defined.

Event Timeline

Ottomata created this task.Sep 21 2020, 2:49 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 21 2020, 2:49 PM
Ottomata renamed this task from EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set headers. to EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set headers values in event data.Sep 21 2020, 2:50 PM
Ottomata updated the task description. (Show Details)
Ottomata added subscribers: fdans, mforns, Zbyszko and 2 others.
Ottomata updated the task description. (Show Details)Sep 21 2020, 2:52 PM
Ottomata updated the task description. (Show Details)
Ottomata renamed this task from EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set headers values in event data to EventGate idea: use presence of schema properties in http.(request|response)_headers to automatically set header values in event data.Sep 21 2020, 3:51 PM

Change 629406 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[analytics/refinery/source@master] Spark JsonSchemaConverter - additionalProperties with schema is always a MapType

https://gerrit.wikimedia.org/r/629406

Change 629448 had a related patch set uploaded (by Ottomata; owner: Ottomata):
[eventgate-wikimedia@master] [WIP] Choose HTTP header defaults to set based on schema properties

https://gerrit.wikimedia.org/r/629448

CDanis added a subscriber: CDanis.Sep 23 2020, 10:52 PM
razzi moved this task from Incoming to Event Platform on the Analytics board.Oct 15 2020, 3:46 PM

Hm, another possible idea: Instead of using defined properties, we could make use of ajv-keywords' dynamicDefaults. I can enable this feature in EventGate, and we can pass in custom default setting functions in eventgate-wikimedia, like:

function getHeaderValue(args) {
  return req.headers[args.header_name]
}

const ajvCustomDynamicDefaults = {
    'http_header_value': getHeaderValue
};
const eventValidator = new EventValidator({..., ajvCustomDynamicDefaults, });

Then, a schema that wants to set a default header value can do:


request_headers:
  type: object
  additionalProperties:
    type: string
  # This must be set to at least an empty object to have any sub-properties have dynamicDefaults set
  default: {}
  dynamicDefaults:
    # Defaults the 'user-agent' property to the return value of getHeaderValue({header_name: 'user-agent'})
    'user-agent': { func: http_header_value, header_name: 'user-agent' }

This isn't that much more elegant than the defined properties idea, but at least it makes use of something already supported by AJV, and would DRY up and automate some of the existent code from eventgate-wikimedia's makeSetWikimediaDefaults.

This would have some eventgate-wikimedia customization creep into schemas though, i.e. the names of the registered functions, like http_header_value. Not that it isn't already there now, its just a bit more implicit. Current code does:

  • if schema has field and header has value, set default value

This idea would be:

  • if eventgate is configured with dynamic defaults and schema has matching dynamicDefaults defined, set default value.

the dynamicDefaults idea SGTM as well!