Page MenuHomePhabricator

Generalize request templating for service requests
Closed, ResolvedPublic

Description

This is currently a todo at

https://github.com/wikimedia/restbase/blob/a9b13b658bc392f3e3c2e1c27026d65be68d7d7c/lib/router.js#L108

(note that this is on the 'modules' branch).

We should be able to use the existing request handler code for the templating.

An example for the 'service' stanza is at https://github.com/wikimedia/restbase/blob/a9b13b658bc392f3e3c2e1c27026d65be68d7d7c/interfaces/mediawiki/v1/content.yaml#L47. It is intended to support full request templating the way the request handler does as well.

Example template:

{
  method: 'post',
  uri: '/{domain}/...', 
  headers: {
    'cache-control': '{cache-control}',
    'foobar': 'Some value'
  },
  body: {
    a: '{a}' // equivalent to $.request.body.a
  }
}

Event Timeline

GWicke assigned this task to mobrovac.
GWicke reassigned this task from mobrovac to Hardikj.
GWicke raised the priority of this task from to High.
GWicke updated the task description. (Show Details)
GWicke set Security to None.
GWicke removed a subscriber: Aklapper.

A basic version of this has now landed.

Since this is now going to be used in a hot path we should optimize and clean up request templating a bit. @Hardikj, lets use this ticket to track that follow-up work as well?

GWicke renamed this task from Implement request templating for service requests to Improve request templating for service requests.Mar 15 2015, 4:25 PM
GWicke lowered the priority of this task from High to Medium.

Let me try to provide some background for this task.

We often have a need to template a backend or internal request based on an incoming request. Currently we already have support to do this on the URL, but more general request templating support is still missing.

The main requirements are:

  • ability to template the URL (already implemented)
  • ability to construct a new set of headers, some of which might be templated
  • ability to forward and possibly template a new JSON body

For the syntax, we have been toying with $var and {var}. The latter is what we use in URI templates already, so it would provide some degree of consistency to keep using it in templates.

While full string templating (with a mix of literals & variables) in headers etc would be desirable at some point, we can start by only supporting references to complete values:

{
  method: 'post',
  uri: '/{domain}/...', 
  headers: {
    'cache-control': '{cache-control}',
    'foobar': 'Some value'
  },
  body: {
    a: '{a}' // equivalent to req.body.a
  }
}

In this scheme, name resolution could be context-dependent. For a template parameter {domain} in the uri component, we reference req.params.domain first (should perhaps be req.uri.domain for consistency, later). If that fails, we switch to the global namespace, which should for now contain req, a reference to the original request. Similarly, a template string in headers would default to the namespace req.headers.* first.

References of the form '{a}' can potentially reference a structure without flattening it to a string. This is especially important in the body. Flattening should only happen if a) a mixed template with string literals is supplied (later!), or b) at the end of templating, to coerce the result to the type expected for the context (ex: headers are expected to be strings).

These templates are used in very hot code paths. This means that they need to perform well. An approach to make them fast is to compile them to native code. The challenge with doing so is keeping this secure. We might be able to steal some ideas & code (especially around name resolution / namespace control) from https://github.com/gwicke/tassembly, a general templating backend primarily aimed at web templating. We could even consider using it for general string templating later, but can't directly use it for references like '{a}' as tassembly currently only supports building up strings.

See also:

GWicke renamed this task from Improve request templating for service requests to Generalize request templating for service requests.Jul 20 2015, 4:07 PM

LGTM, except I'd point out that we want the templates to mean if a value is present, use that, otherwise skip it. E.g. for

headers:
  cache-control: '{cache-control}'

we want the sub-request to contain a cache-control header iff the original request contains one, i.e. if {cache-control} resolves to something other than null, undefined or an empty string. Naturally, all of the other headers not in the template should be dropped.

As for the body, this seems to be a much trickier topic. E.g. for:

body:
  a: '{a}'

Obviously, this tells us we expect a JSON body from the client, which is nice as it means we can error out right away if it won't parse. But, what if {a} does not evaluate to a truth-y value? Do we send an empty body? Do we error out? Do we do action Z? Each action could be mapped to different cases. As a side note, note that the same case could be made for nested structures, something like:

body:
  a:
    one: '{a.c}'
    two: 'blah blah'

References of the form '{a}' can potentially reference a structure without flattening it to a string. This is especially important in the body. Flattening should only happen if a) a mixed template with string literals is supplied (later!), or b) at the end of templating, to coerce the result to the type expected for the context (ex: headers are expected to be strings).

What about saying that a: '\'{a}\'' makes that coercion explicit? That would allow us to implement your option (a) more easily.

I can't see (currently) a case where body intervention is really needed, so I'm inclined to say we should save the body talk for later, as the primary concern / goal here is really header filtering / selection. Do you envision concrete usages of body templating, @GWicke?

LGTM, except I'd point out that we want the templates to mean if a value is present, use that, otherwise skip it. E.g. for

headers:
  cache-control: '{cache-control}'

we want the sub-request to contain a cache-control header iff the original request contains one, i.e. if {cache-control} resolves to something other than null, undefined or an empty string.

I definitely agree with leaving out the header when encountering undefined. The value null would likely be a bug in some backend code and can't be passed in by a client. It might make sense to throw an error in that case. Empty string values might be rare, but they are part of the HTTP spec: https://bugzilla.mozilla.org/show_bug.cgi?id=474845. In practical terms I don't see a strong reason for omitting headers if the value is empty. Most cases where that would be useful should already be covered by undefined.

Naturally, all of the other headers not in the template should be dropped.

Agreed.

As for the body, this seems to be a much trickier topic. E.g. for:

body:
  a: '{a}'

Obviously, this tells us we expect a JSON body from the client, which is nice as it means we can error out right away if it won't parse. But, what if {a} does not evaluate to a truth-y value? Do we send an empty body? Do we error out? Do we do action Z? Each action could be mapped to different cases.

If it resolves to undefined, then I'd say we omit a: (this happens implicitly with JSON.stringify). All other values can be transmitted as-is. In the future, we could consider adding default value or assert-present functionality by allowing some pre-defined function expressions inside the braces.

References of the form '{a}' can potentially reference a structure without flattening it to a string. This is especially important in the body. Flattening should only happen if a) a mixed template with string literals is supplied (later!), or b) at the end of templating, to coerce the result to the type expected for the context (ex: headers are expected to be strings).

What about saying that a: '\'{a}\'' makes that coercion explicit? That would allow us to implement your option (a) more easily.

I'd rather add function expressions like {json_serialize(a)} later to enforce this. That way we could even support multiple serialization formats.

I can't see (currently) a case where body intervention is really needed, so I'm inclined to say we should save the body talk for later, as the primary concern / goal here is really header filtering / selection. Do you envision concrete usages of body templating, @GWicke?

One example would be templating of the backend request in T105975: RFC: Generalize content-addressable POST request storage. As a first step just referencing the entire body should be enough, but I could well imagine a need to send only a part of the original post (plus perhaps some other data from a GET) to a service later.

@Pchelolo and I had a chat about it, and we concluded that:

  1. We also need to support request query templating (will be needed by some *oids). If query is not specified in the template, it should be dropped from the original request, just like the other fields.
  2. Searching in req if a field cannot be found is not really useful. req can only ever contain uri, params, query, headers and body, so searching for req.a makes no sense. On the other hand, searching for a in any of the fields of req hurts performance, it being a hot code path. Moreover, it can cause security problems as it introduces ambiguous fields. An example of this is putting body: headers.cache-control, where we mean use the cache-control request header, but if an attacker specifies {"headers": {...}} as the request body, we might get unexpected results.
  3. Because of the example in (2), there should be a field priority: if the template specifies {req.params} then we take it to mean the original request's params field. In the unlikely event that the request's body contains a req field, and we want to address that one, {req.body.req} should be used.

@Pchelolo and I had a chat about it, and we concluded that:

  1. We also need to support request query templating (will be needed by some *oids). If query is not specified in the template, it should be dropped from the original request, just like the other fields.

Agreed. The new request is defined exactly by the fields specified in the template.

  1. Searching in req if a field cannot be found is not really useful. req can only ever contain uri, params, query, headers and body, so searching for req.a makes no sense. On the other hand, searching for a in any of the fields of req hurts performance, it being a hot code path.

That's not what I proposed. Sorry if I was being unclear. The idea is basically to try local variables first (namespace defined by the part of the req being templated), and then fall back to the global namespace. The global namespace for now primarily contains the key req, plus perhaps the key post_request in some situations. There is no magic looking inside of req or any other object in the global namespace beyond the local scope dereferencing.

Moreover, it can cause security problems as it introduces ambiguous fields. An example of this is putting body: headers.cache-control, where we mean use the cache-control request header, but if an attacker specifies {"headers": {...}} as the request body, we might get unexpected results.

This would only ever reference a 'headers' field in the original body. But, you are right that it could be an issue for req and other globals.

How about adding the rule that all dotted paths are references to the global namespace? So, req.headers would always reference the global namespace, while just a would be resolved locally first.

The alternative would be some other syntactic distinction, like a reserved 'global' object ( $.req.headers.content-type). Another option would be using slashes instead of dots, as in /req/headers/content-type, which naturally provides a syntactic prefix with the right intuitive semantics.

  1. Because of the example in (2), there should be a field priority: if the template specifies {req.params} then we take it to mean the original request's params field. In the unlikely event that the request's body contains a req field, and we want to address that one, {req.body.req} should be used.

Yes, that's basically the idea. We already use short context-dependent references in URI templates, and I think it makes sense for ergonomics and consistency to continue doing so.

We just discussed the global vs. local namespacing issue. The general consensus is leaning towards this solution:

  • explicitly prefix all globals with $., and don't perform any global lookup otherwise
  • scope locals to the template part only, so that a anywhere in body: refers to $.req.body.a

We just discussed the global vs. local namespacing issue. The general consensus is leaning towards this solution:

  • explicitly prefix all globals with $., and don't perform any global lookup otherwise
  • scope locals to the template part only, so that a anywhere in body: refers to $.req.body.a

+1 LGTM

Thinking about it some more, we actually avoided a bad clash by choosing $. over /path/: In URI templates, {/var} already has the meaning 'optional path component', which would clash with path-based variable references.

@mobrovac & I just discussed naming consistency in T105975. We realized that pretty much everything else uses _request instead of req, so for consistency it would be nice to actually use $.request for the incoming request, to fit with $.post_request and potentially others in the future.

The implementation was done by several PRs:
https://github.com/wikimedia/restbase/pull/295
https://github.com/wikimedia/restbase/pull/283

Now they are merged, so this task can be closed.