Page MenuHomePhabricator

Reconsider REST router path parameter handling
Open, In Progress, MediumPublic

Description

tl;dr: the current REST router path parameter validation and path matching make certain endpoint combinations confusing or impossible. Decide how to resolve this.

Update: we are strongly leaning toward Possibility #3 below: prohibit non-required path parameters in the route, but allow them at the Handler level, thereby resolving the path conflict issue, while still allowing the implementation convenience of letting handlers serve multiple routes

Current Status

Consider the following three REST endpoints:

  1. /myextension/foo
  2. /myextension/foo/
  3. /myextension/foo/{id}

The REST PathMatcher will not allow all three of these to exist. It will detect a conflict between #2 and #3

Considering this to be a conflict is necessary because path parameters can be marked as non-required (and therefore omitted from the request). If presented with a request for /foo/, the PathMatcher would not be able to determine whether it should invoke #2 or #3.

It is worth mentioning that in an informal poll of MediaWiki developers familiar with APIs but unfamiliar with the details of the REST PathMatcher, opinions were divided on whether #1 and #2 would conflict, or whether #2 and #3 would conflict. This indicates the current behavior is unclear to developers, which is inherently undesirable if it can be avoided.

Motivating Issue

This is undesirable, because there are real circumstances where we wish to have both /foo/ and /foo/{id}. Specifically, in reimplementing the Reading Lists API in REST as part of RESTBase deprecation (see T336693: Re-implement reading lists REST interface outside RESTbase), we are obligated to match the existing GET /lists/ path currently provided by RESTBase and in use by callers. But we also wish to provide a GET /lists/{id} endpoint, to maintain feature parity with the existing Action API Reading List endpoints (and hopefully retire the Action API endpoints, per T348494: Reading List REST Interface: remove Action API endpoints, leaving us only one API to maintain for Reading Lists).

This is not the only existing API where this will come up. For example /page/ from Page Content has the same challenge. Reading Lists just happens to be the one we noticed first.

Possible Solutions

There are several possible resolutions, including at least:

  1. Do nothing. For the case of Reading Lists, implement only /lists and /lists/{id}, and redirect /lists/ to /lists at the API Gateway level. In this specific case, {id} is a required parameter, so there would be no ambiguity in a gateway-level redirect. And then hope this situation doesn't come up again in other contexts.
  2. Disallow non-required path parameters. This would allow the PathMatcher to (theoretically) distinguish between /foo/ and /foo/{id}, because the {id} portion would always be present. It is unconfirmed but seems likely that an efficient modification could be made to PathMatcher to distinguish these cases.
  3. Disallow non-required path parameters at the PathMatcher level, but continue to allow them at the Handler level, in getParamSettings(). This would allow handlers to still serve multiple routes, some of which may not supply all possible path parameters, while still resolving the motivating issue.
  4. Allow templates to indicate to PathMatcher whether a path parameter is required or not. PathMatcher could then use this additional information to determine if a conflict truly exists. Downsides include additional complexity in the PathMatcher implementation, additional and perhaps non-obvious consideration for developers creating endpoints, and the possibility that the actual parameter definitions in the REST handlers do not match the endpoint templates. (In other words, a parameter might be indicated as required in the template, but as not required in the Handler code. Automatic detection might be possible at the expense of more complex Handler code, and an unknown increase in execution time.)
  5. Do something hand-wavey with redirects
  6. Something else I didn't think of

Some of these possibilities allow us to cleanly move to others in the future if desired. For example, #2 would resolve the current situation and would not prohibit implementing #4 in the future. Also note note that the implementation of non-required parameters is currently incomplete: default values assigned in the handler's parameter definition are ignored, and non-required parameters are always received as empty strings. However, there are some endpoints with non-required path parameters in production, so maybe this is a non-starter:

Additional Considerations

If we make any choice that continues to support non-required path parameters, we should also make their default values function correctly.

Event Timeline

Summary of related Slack discussion that occurred before this task was created. @daniel and @pmiazga , please correct anything that I misrepresented from your Slack comments.

  • A path like /myextension/foo/ seemed odd to some people, to the point that trailing slashes should be disallowed by the router/matcher
  • Alternatively, a trailing slash could indicate that you're working with a collection rather than a single object. So /pages/ would give a list of page. POSTing to create a page would be to /page and not /pages/, because you're not interacting with the collection as a whole
  • RFC 3986 discussed trailing slashes in terms of merging a base path with a relative path references. In that context, anything after the final slash is dropped during the merge. This doesn't seem (to me at least) to apply here, but if there were a situation where the REST url could be considered a base path to which a relative path were merged, this would argue for trailing slashes
  • swagger.io says (regarding path parameters): "remember to add required: true, because path parameters are always required". A quick google suggests that non-required path parameters are problematic for some swaggers spec tooling. So continuing to allow non-required path parameters might cause issues with expanding our use of swagger specs.
BPirkle triaged this task as Medium priority.Mar 11 2024, 3:59 PM
BPirkle moved this task from Incoming (Needs Triage) to In Progress on the MW-Interfaces-Team board.

To elaborate a bit on the OpenAPI/Swagger issue this section of the swagger.io params documentation page reinforces that path parameters cannot be non-required. Specifically, this text: "path parameters must have required: true, because they are always required."

Default parameters are allowed, but only "for an optional parameter", which excludes path parameters. This is further clarified by the text "if a value is required, the client must always send it, and the default value is never used"

There are also allowEmptyValue and nullable options, described in this section on swagger.io. This explicitly applies only to "query string parameters" and "schemas", respectively.

In short, if we're going to use swagger specs and related tooling, non-required path parameters seem problematic. And that feels to me like a lot to give up for something I personally find dubious in the first place.

Regarding our existing usage of non-required path parameters, the one external usage found by codesearch (DokuApi) seems trivial to refactor by making routes both with and without the path parameter point at the same handler, which does appear to actually reference the parameter value.

I could coordinate with Content Transform and Growth on refactoring the other usages (and I'd be happy to code the changes, if they're willing to provide guidance and review).

Does anyone want to argue for keeping non-required path parameters? Subscribing some folks who might have opinions. Feel free to remove yourself and/or add others.

Hearing no objections, I'm planning to work on patches for disallowing non-required path parameters in the REST API. If anyone thinks that's a bad idea, please say so.

I think it's useful to have suport for optional parameters in the parameter validator, so we can register the same handler for multiple routes, some of which may be missing some parameters.

For the matcher however, I think we don't want to support optional parameters, so /myextension/foo and /myextension/foo/ will become synonymous, and /myextension/foo/ no longer conflicts with /myextension/foo/{id}.

Note that this means we don't support *empty* parameters either, so /myextension/foo//bar will not match /myextension/foo/{id}/bar.

@BPirkle wrote in the Task description:

Possible Solutions: 1 […] 5 […]

There might be a sixth option. If /myextension/foo/ and /myextension/foo/{id} (where ID can be optional) are considered conflicting routes today, does that mean there is also the option of implementing what you're looking for, by registering the latter and having the route handler respond with what you want from a conditional !id branch?

It seems like that would work, and might be how the routing was intended. Perhaps @tstarling would know, but it seems to me like we don't have to limit ourselves to options where there are literally separate routes in the registry. A route can vary its response in all sorts of ways.

There might be a sixth option. If /myextension/foo/ and /myextension/foo/{id} (where ID can be optional) are considered conflicting routes today, does that mean there is also the option of implementing what you're looking for, by registering the latter and having the route handler respond with what you want from a conditional !id branch?

I started to say that's not a sixth option because #5 was "Something else I didn't think of". But if I'm understanding your suggestion, I did actually try coding that up and rejected it.

To confirm, I think what you're saying is that I could register only /myextension/foo/{id} and then then in the handler detect if the {id} parameter was provided or not. So one entry in the extension.json file which directs to (let's say) FooHandler. FooHandler would register id as a non-required path parameter. It would then check at execution time for the presence or absence of id and respond accordingly. The rest of this comment will assume that's what you meant. If you meant something else, please ignore the rest and clarify.

That would be technically possible, but I rejected that for several reasons:

  1. From a REST standpoint, /myextension/foo/ logically returns a collection of whatever resource type foo is, while /myextension/foo/{id} returns a single instance of foo. Having FooHandler respond to both complicates its implementation, making the class less cohesive.
  2. This also complicates testing, which would need to test both the collection and single-instance interpretations of the endpoint.
  3. The collection interpretation likely include some query parameters related to collections. Things like limit, sort, pagination. These are nonsensical for the single-resource interpretation, but by having one route, these nonsensical parameters are now part of the endpoint's contract even for cases where they do not apply.
  4. For auto-generated OpenAPI specs (which we already have a development route for) there is only one route in the spec. From scanning a list of the available endpoints, callers have no good way to know that /myextension/foo/ is a thing. They'd have to read the details of /myextension/foo/{id} to discover that (presumably from descriptive text we'd write).
  5. There's no way other than descriptive text to communicate via the OpenAPI spec that {id} is not required, because non-required path parameters are not allowed in OpenAPI. So the {id} parameter would have required: true in the spec, but descriptive text that says something like "no, really this is not required even though the spec says it is".
  6. Any clients auto-generated from the OpenAPI spec would not understand any of this.

In short: yes, code could be written that would handle both. And I did try to write that code, just to see how it would feel. It didn't feel good.

I think it's useful to have suport for optional parameters in the parameter validator, so we can register the same handler for multiple routes, some of which may be missing some parameters.

I don't object to this as strongly as I do to having non-required path parameters in the actual route. However, keep in mind that, as long as the validator specs are the source of truth for our OpenAPI spec, using this trick would mean that we'd either have to generate an invalid OpenAPI spec (by including required: false for a path parameter, which isn't technically allowed by OpenAPI), or else generating a spec that doesn't match the endpoint (by emitting required: true when in fact it is false. Just to make coding handlers more convenient.

Some of the discussion around all this (not just in this task but in various other places including face-to-face meetings) feels to me like we're unsure whether we're implementing a REST API infrastructure, or whether we're implementing an API infrastructure that can be used to create REST APIs, but that can also be used to create other APIs that have some mechanical things in common with REST.

I do recognize that what "REST API" means isn't universally agreed upon, that well-meaning people can legitimately disagree on what is and isn't "REST", that being theoretically correct (if there even is such a thing) is not always desirable, and that any practical implementation of sufficient complexity will probably deviate from theoretical best practice in some way (we already do). And that's all fine. But if we intend to turn our somewhat-immature API that is accessible via rest.php into a mature REST api, it seems to me that some motivating reasons are:

  1. it allows us to leverage tooling not invented here (mostly OpenAPI-based tooling, simply because that's what people have already built)
  2. it allows callers to leverage such tooling on their side
  3. it conforms to what callers generally expect from a "REST API", allowing them to leverage understanding and experience they've gained from exposure to non-WMF REST APIs

It feels to me like the losses of back-door allowing non-required path parameters don't outweigh the gains. Handlers are an implementation detail, and we're pretty good coders. So not being able to directly reuse them in some situations does not feel to me like a massive issue, while compromising our interaction with OpenAPI does. I'd personally rather require us to write a little more code, than to compromise what seems like a design principle.

For the record, I'm not an OpenAPI purist or evangelist. I have some real gripes about it, and kinda wish we (as an industry) could trash the whole OpenAPI thing and start over. But it also seems to me like OpenAPI is where the industry is at and what callers will expect, that we build APIs primarily for callers and not for ourselves ("we" being MW core/extension developers), and that we should therefore, however grudgingly, at least not make choices that make supporting it problematic.

Happy to hear counterarguments.

@BPirkle Thanks for writing that up. If I understand correctly:

  • OpenAPI does not support the concept of optional path components, thus limiting us and forcing us to pretend the superset of query parameters are valid for both the collection and the subpath.
  • MediaWiki REST API currently does support optional path components.
  • We have no known uses of, or use cases for, optional path components — other than the current Reading Lists API migration.
  • By removing support of this feature from MediaWiki's REST framework, route 2 and route 3 (in the task description) will no longer conflict; thus allowing us to register these as separate routes.

I can see merit in Solution 2. But, per Daniel, I also see merit in Solution 3 - i..e only remove support for optional components when used to create different types of responses (in the way I hinted at). So long as the query parameters and response format stay the same, it might be of some use to allow something like /something/mythings/ with optional /something/mythings/:year/:month/:day to narrow things down. On the other hand, I don't see how that is an improvemenet over e.g. optional query parameters instead. Except, I suppose, when the improvement is to retain compat with a pre-existing API.

I'm not expressing support for or against Bill's proposed outcome. Y'all at it. I wanted to check that I understand the ideas.

  • We have no known uses of, or use cases for, optional path components — other than the current Reading Lists API migration.

We do have a few current uses of optional path parameters (see the end of task description for a list). I'm volunteering to recode those, if we decide to disallow optional parameters.

Reading Lists is actually not a use case for optional path parameters. Instead, Reading Lists is soft blocked on the current resultant route conflict. (I say "soft blocked" because we do have a viable API Gateway level workaround. I'd just prefer to address the underlying issue head-on.)

Change #1016483 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] Add additional PathMatcher test cases

https://gerrit.wikimedia.org/r/1016483

The above patch just adds a few more test cases for PathMatcher. It does not change any existing PathMatcher behaviors.

I have a local change for modifying PathMatcher to prohibit optional path parameters, just to make sure that I didn't run into any unexpected challenges while writing that. (I'll post that change as WIP after some more local testing.) But I noticed when creating that patch that the existing tests didn't fail when I added the prohibition, which bothered me. Ideally that change in behavior without corresponding test changes would have resulted in a test failure. So I wrote added the additional test cases.

If we do prohibit optional path parameters, the new test cases in the patch I posted tonight will start failing and will need to be modified, but that's fine and expected.

Change #1016820 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] Disallow optional path parameters

https://gerrit.wikimedia.org/r/1016820

Change #1016483 merged by jenkins-bot:

[mediawiki/core@master] Add additional PathMatcher test cases

https://gerrit.wikimedia.org/r/1016483

On the subject of trailing slashes:

FJoseph-WMF changed the task status from Open to In Progress.Fri, Apr 5, 2:42 PM

Change #1018400 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] Refactor transform endpoints to not use optional path params

https://gerrit.wikimedia.org/r/1018400

I think it's useful to have suport for optional parameters in the parameter validator, so we can register the same handler for multiple routes, some of which may be missing some parameters.

There is an implementation trick technique that might sometimes be useful. At least, it was in the transform handler change I just posted.

Let's assume you have a handler that does some complex logic, and that it would be convenient to change the behavior of that logic based on an optional path parameter. Prohibiting optional path parameters means that won't work - you'll now need a derived handler class to add the path parameter, and two routes (one with the parameter and one without). But you can still keep all your logic in the base class, including the part where you reference your "optional" path parameter. This makes the derived class handler very, very simple.

Your routes would be something like this:

	{
		"method": "GET",
		"path": "/v1/foo",
		"class": "MediaWiki\\Rest\\Handler\\FooHandler"
	},
	{
		"method": "GET",
		"path": "/v1/foo/{bar}",
		"class": "MediaWiki\\Rest\\Handler\\FooBarHandler"
	}

FooBarHandler will be very small, something like:

class FooBarHandler extends FooHandler {
	public function getParamSettings() {
		return [
			'bar' => [ 
				self::PARAM_SOURCE => 'path',
				ParamValidator::PARAM_TYPE => 'string',
				ParamValidator::PARAM_REQUIRED => true, 
			]
		]  + parent::getParamSettings();
	}
}

And in FooHandler, you'll have code something like this:

public function getBarPathParam() {
      return $this->getValidatedParams()['bar'] ?? 'myDefaultValue';
}

That feels reasonable and not overly burdensome to me to get the various benefits of not supporting optional path parameters.

Daniel and I discussed this a bit more in Slack, and agreed to modify the plan somewhat. We will still be disallowing optional path parameters in routes. But we will be allowing them in handlers (more specifically, in GetParamSettings()).

The idea is that the route (as expressed in coreRoutes.json, coreDevelopmentRoutes.json, or extension.json) defines the path for the contract, and (for all the reasons already discussed on this task) it is a good thing to not allow optional path parameters in the contract. However, the handler is concerned with the implementation and for various reasons it is helpful within the implementation to allow for optional path parameters.

If anyone disagrees, please let me know. Otherwise I'll refactor the patches accordingly (mostly be simplifying them).

A copy of the Slack discussion follows.

Daniel Kinzler
I just found myself making use of optional path parameters again, see https://gerrit.wikimedia.org/r/c/mediawiki/core/+/969811/45/includes/Rest/coreDevelopmentRoutes.json. Creating a subclass just for that just feels off. Would it really be so terrible to allow optional path params in the validator? Can you remind me what problem it causes?

Bill Pirkle
Several things, but the most compelling is that it means we are not compliant with Swagger/OpenAPI. That seems a big deal. Subclasses are pretty cheap.

Is what you want really an optional path parameter (from a spec perspective)? Or is it the ability to map multiple routes to one handler? Because those sound like conceptually different things to me. Right now, one implies the other, because we're using handler parameter settings in multiple ways. But maybe that's something we need to reconsider.

Daniel Kinzler
Subclasses are pretty cheap, they just add a bit of mental load.

Is what you want really an optional path parameter (from a spec perspective)?
No, not from the spec's perspective. We'd have one spec per path, one spec would have that parameter, and the other wouldn't.

Or is it the ability to map multiple routes to one handler?
Yes. I would like to be able to map multiple paths, with different sets of path parameters, to the same handler. In getParamSettings, I'd declare some of the path params options.
The question is, how do we then generate the spec? We can't just use whethever getParamSettings() gives us, because then the parameter declaration would be present in paths that shouldn't have them. I suppose the Handler base class could have a getPathParameterNames() method which would inspect the path the handler object was instantiated for, and return the names of parameters in that path. It should work. But maybe it's too much magic.

Wanting a single handler for several variations of a path is pretty commons. Making that easy would be nice. And it's possible to introduce some magic that will make the convenient way possible.
On the other hand, "magic" always has the downside of the potential for confusion, and mental load for anyone wanting to change the framework.

Bill Pirkle
The key bit for me is that (I think) what we are talking about having is not optional parameters from the contract perspective. Instead, we're talking about parameters that a handler may or may not receive depending which route it is serving. I wonder if there's a way to make that feel more explicit and less magic in the code.

  1. I don't really care how magical the spec generation code is, because it is supposed to be magic. So Daniel's suggestion about having the Handler base class inspect the route at invocation time seems fine there. Presumably that'd work in conjunction with RootSpecHandler (or whatever spec generation code we have after introducing modules). An engineer would need to understand that magic if they were actually working on spec generation, but otherwise it'd Just Work (TM).
  2. For actually servicing requests, engineers do need to look behind the curtain, or at least be aware the curtain exists, because your handler will need to know how to declare and process its parameters. The smallest change seems to be:
    1. make path parameters required at the contract/path matcher/router/module level. From a caller's perspective we will no longer be allowing optional path parameters, in any cases.
    2. continue allowing handlers to mark path parameters as "ParamValidator::PARAM_REQUIRED => false" in getParamSettings(). The interpretation of this will mean the parameter may be supplied by some routes but not by others.
    3. document everywhere we can that for path parameters, non-required means "the caller always has to send all the parameters listed in the route, but for any specific request the handler may or may not receive parameters listed by the handler as not required". This is a different interpretation than for query parameters, where setting required to false means "the caller does not have to provide this".
  3. My proposed changes to PathMatcher would still apply. Some other things in that change (including but not limited to the RestStructureTest changes) would need to be modified. As would my TransformHandler change, which I think will become much simpler (or maybe not needed at all???). That's all fine. I'd rather create something we're happy to live with for awhile. Especially if it means I have fewer non-MWI repos to fiddle with.

I'm slightly uncomfortable with having "ParamValidator::PARAM_REQUIRED => false" mean different things for different parameter types, especially when they're right next to each other in the same array with no indication that they're different. Alternatively, we could break up getParamSettings() such that each parameter type (path/query/body) is declared in its own function. However, I'm not sure that automatically makes things more clear, and it means extra boilerplate even for handlers that don't have multiple routes pointed at them (and for which this is all therefore irrelevant).

My original objection to this was that we'd either have to emit an invalid OpenAPI spec, or else we'd have to emit a spec that doesn't (appear to) match getParamSettings(). We're saying here that it'd be the latter: getParamSettings() would claim the path parameter was not required, even though the OpenAPI spec would say that it was. And both would actually be true, it is just different perspectives on what "required" means. The OpenAPI spec would mean it is "required by the contract and the caller must provide it" (and would always be true for path parameters), while a required value of false in getParamSettings() would mean it is "not required by the handler, even though it is probably required by one of the routes that points at this handler).

One thing that softened me a bit on this was seeing that Parsoid extends the core TransformHandler class. If non-core repos are extending core handlers, it makes more sense to me that a core handler designed as a base class for use in non-core repos might want, in a forward looking way, to provide logic that deals with certain possible path params, without knowing if such a param would ever actually exist or not. We could still support this even with the strict definition using a trick technique like this. But then the engineer creating the derived handler has no authoritative source of truth to see that the optional parameter exists. They either have to see it in the handler's documentation (which notoriously drifts from the code in cases like this), or else just know/discover that it is used in the logic. None of that seems very friendly.

Also, nothing above prevents us from changing our mind and deciding to be more strict later if we decide we don't like this. So it seems like a good "stepping stone" sort of change that doesn't block additional future refactoring.

So, I'm now cautiously in favor of allowing non-required path parameters in the handler, but considering all path parameters in routes to be required.
Is there anything I'm forgetting that would make this a bad idea? Does this sound reasonable?

Daniel Kinzler
Thank you for the thorough thinking, Bill! I think this solution is pretty elegant:
Any path parameters defined in a route are required for that route
A handler can handle several routes. If the handler declares optional path parameters, it means that it can handle routes that do not provide these parameters. From the perspective of the handler, and the perspective of ParamValidator, they are indeed optional.
The code that generates the OpenAPI spec iterates all paths. For each path, it instantiates a handler. In order to generate the parameter specs, it calls getParamSettings on the handler. For optional path parameters, it will include the declaration of that parameter only if that parameter is actually present in the path. So the "magic" is entirely in the spec generating code.
I like it :)

BPirkle updated the task description. (Show Details)

Looking through the potentially affected code, to see what would need to be changed if we go with the approach from my previous comment.

So, good news! Only one minor change is needed. I'll submit a patch for that.

If anyone sees any potential breakage that I missed, please let me know.

Change #1019785 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/extensions/GrowthExperiments@master] Add REST route for mentees/prefixsearch without prefix parameter

https://gerrit.wikimedia.org/r/1019785

Change #1021373 had a related patch set uploaded (by Urbanecm; author: Urbanecm):

[mediawiki/extensions/GrowthExperiments@master] MenteesPrefixSearchHandler: Set prefix as required

https://gerrit.wikimedia.org/r/1021373