Page MenuHomePhabricator

Instance-level EventGate configuration to enable/disable functionality
Open, Needs TriagePublic

Description

EventGate should be modified/refactored/rearchitected (or at least have some standardized procedure written down somewhere) for instance-level configuration to enable/disable functionality. Currently we only do this on a per-stream basis, and it's a very messy pile of if statements.

This can be done with or separate from T406747

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Draft: Refactor makeSetWikimediaDefaults into configurable transform functionsrepos/data-engineering/eventgate-wikimedia!32tchinconfigurable-transformsmaster
Customize query in GitLab

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript
tchin updated the task description. (Show Details)

I chatted with @Ottomata about this a little bit, here's what I'm going to attempt:

  1. Refactor setWikimediaDefaults to extract out most of the different event transformation functionality into separate transform functions. These functions have to be coded defensively since some of them depend on others.
  2. Figure out how to indicate that these functions are configurable. What I'm thinking right now is to create a function that wraps these new transform functions so that we can specify what configuration key enables/disables this function. Something like:
function makeConfigurableFunction(configLocation,  defaultEnabled = true, transformFunction)

Then in config maybe for example:

transform:
  hoist_request_id: true
  edge_uniques: true

If you do this though you have to hardcode the ordering. Maybe it would be better to have a function that defines other functions as a transform function and then in the config specify an ordered list of transform functions? Then you lose the ability to enable multiple transforms through one toggle, although maybe that functionality isn't needed anyways

function defineTransformFunction(transformFunction)
transform:
  - x_request_id
  - edge_uniques

I'll think about it a bit more. Anyways, last step would be:

  1. Have eventgate-wikimedia register the enabled functions (on startup if possible) so that all setWikimediaDefaults has to do is call an ordered list of registered transform functions.

(We're going to put all this into eventgate-wikimedia for now. There's still the intention to merge repos in the future but it's low priority)

Nice!

an ordered list of transform functions?

However you do this, an ordered list of functions is probably wise :)

BTW, Many of the transform steps require more context than just the event to transform. E.g. enrich_fields_from_http_headers will require the stream config settings to know what to do. That is why makeSetWikimediaDefaults returns a function to call, rather than being a setWikimediaDefaults function itself. So, probably, your interface/convention for folks to write transform functions will be that too. E.g.

function makeEnrichWithHttpHeaderTransformFunction(options, logger) {
   const streamConfigs = options.streamConfigs || makeStreamConfigs(options, logger);

   return async (event, context) => {
       // use streamConfigs to hoist configured context.req.headers into the event.
       return event;
   }
}

It would be nice and cleaner if you could do each function-maker more minimally and not have to pass all options to every one, but I'm not sure if it is worth it in this refactor. YMMV though! (Also, feel free to come up with a better naming convention than make___Function. It has always read a little awkward!)