Page MenuHomePhabricator

Decide whether to keep violating OpenAPI/Swagger specification in our REST services
Closed, ResolvedPublic

Description

In T217747, it was discovered that, in WMF REST services, we use Swagger specs that violate the specification. Specifically, while the Swagger specification for either version 2.x or 3.x does not allow to have optional path parameters, our specs used a devised syntax to denote it, a syntax that violates the original specification.

Per T217747#5007514

version 2.0 of swagger spec does not allow optional path parameters. WMF has this kind of definition (not limited to cxserver). See comment by @GWicke at https://github.com/OAI/OpenAPI-Specification/issues/93#issuecomment-74362417

In the above comments it is clear an effort was made to add support for RFC 6570 URI templates to the specification, an effort that has effectively failed.

3 years after that effort, and despite it being an often asked feature per https://github.com/OAI/OpenAPI-Specification/issues/574 the addition of RFC 6570 URI template is set as deferred. My personal estimation of the chance of this being added to the spec in a later version is, that it's minimal.

OpenAPI/Swagger 3.0.2, a very recently released version (and the current one), still says very clearly [1]

required	boolean	Determines whether this parameter is mandatory. If the parameter location is "path", this property is REQUIRED and its value MUST be true. Otherwise, the property MAY be included and its default value is false.

So, we are in the precarious position of effectively using an in-house version of the specification that:

  • has the same name AND versioning as the original, causing confusion
  • is largely compatible with the original, but in reality the violating part is used extensively in WMF (it's at least used at https://gerrit.wikimedia.org/r/494703)
  • admittedly no one else uses (that we know of?)
  • no tooling exists around it to validate the correctness of it
  • the standard tooling fails

For a demonstration of the last bullet point, just using https://editor.swagger.io/ and feeding it one of our specs (e.g. https://raw.githubusercontent.com/wikimedia/mediawiki-services-cxserver/master/spec.yaml) will make the issue appear. Using other tooling like https://github.com/pyopenapi/pyswagger or https://pypi.org/project/swagger-parser/ also demonstrates the issue.

Up to now, our own tooling seems to treat our spec.yaml files as yaml and does not validate the correctness of the specification so no real issues have appeared yet, but with the number of services bound to increase, this has higher chances of becoming an issue.

Maybe it's about time we re-evaluate our divergence from the spec and perhaps conform to the spec?

Conforming to the spec would mean that we change our spec files to become more verbose and list all possible permutations that result from the optional nature of a path. e.g.

/required/optional1/optional2/ would mean that instead of a spec

/required{/optional1}{/optional2}

we would have to list

/required
/required/optional1
/required/optional2
/required/optional1/optional2

Adding that, every path stanza can be easily 100 lines of yaml in order to describe methods, responses and monitoring, the benefits of the templating support, if it existed are obvious, but it does not exist.

[1] https://swagger.io/specification/#parameterObject

Event Timeline

akosiaris triaged this task as Medium priority.Mar 8 2019, 8:50 AM
mobrovac subscribed.

Conceptually I agree that we should conform to the spec (even though I don't really understand the OpenAPI people's unwillingness to incorporate optional path parameters - they are called parameters after all). On the practical side, though, copy/pasta would arguably be harder to maintain properly and across multiple projects/repos. In addition to the proliferation of lines in YAML specs, there is also the visual issue of auto-generated docs: we would at least double the number of paths, and that could become unwieldy very quickly; if you consider the current REST API docs it is clear that this would not be sustainable.

To deal with the YAML spec problem, we could try using YAML references. But how do we deal with the docs issue?

I would say we should update our swagger to 3.0 and become standard-compatible.

In 3.0 there's a new concept of components that would deal with the issue of copy-pasta, and, although 30% more routes makes the docs harder to read, we can make the tags for routes more fine-grained.

We will need to go swagger 3 no matter what, the upstream support for swagger 2 is over, and as our fork diverges more and more from upstream the risk of getting a non trivially fixable security issue grows. As for standard-compatibility, I think 30% bit more items in the list would be a small price to pay for all the benefits @akosiaris described.

After reading the openapi spec and examples, I think the best approach to address optional params is by making them query paramters

So, an example path like this

/required{/optional1}{/optional2}

can be just

/required?optional1=value1&optional2=value2

As per spec query parameters can be optional. But this kind of change would mean that we need a new version of api so that we don't break old path based approaches. Plus a deprecation plan.

Well instead of copy/pasta, there is that thing called YAML anchors/references that deal with data repetition in YAML files.

Let's get ourselves spec-compliant :)

After reading the openapi spec and examples, I think the best approach to address optional params is by making them query paramters

So, an example path like this

/required{/optional1}{/optional2}

can be just

/required?optional1=value1&optional2=value2

As per spec query parameters can be optional. But this kind of change would mean that we need a new version of api so that we don't break old path based approaches. Plus a deprecation plan.

I would advise against making a breaking change to our API so that a YAML file is easier to write.

I would advise against making a breaking change to our API so that a YAML file is easier to write.

My understanding here is, having our spec standard compliant is first priority. Currently our apis are not. A good standard is supposed to make the spec easily maintainable. The candidate standard here is https://swagger.io/specification - OpenAPI spec 3.x

/required?optional1=value1&optional2=value2

Doesn't this introduce the problem of parameter ordering? In my mind, the only major advantage of path based parameters is the fact that the caching layer dosn't have to deal with an exponential number of equivalent URLs.

https://cxserver.wikimedia.org/v2?doc#!/Tools/get_v1_list_tool_from_to has two optional parameters from and to. However, from is not optional if to is provided. The UI allows to input only to, but it will in fact return results as if only from was specified.

This can either be seen as a limitation (if all parameters are optional and independent) or a feature in case there is a dependency on the following path parameters on the preceding ones (like, title must be provided if section is provided). The case for title and revision is less clear, since revision ids are unique across titles, so revision id alone could be sufficient.

Looking at the docs, most apis only have single optional parameter at the end (like revision). It's not clear to me why this has been made a path parameter as opposed to a "regular" parameter.

Well instead of copy/pasta, there is that thing called YAML anchors/references that deal with data repetition in YAML files.

That, plus some new features of swagger 3.0

It's not clear to me why this has been made a path parameter as opposed to a "regular" parameter.

In order to avoid issues with query parameter ordering and thus improve caching.

Let me clarify the current semantics between multiple optional parameters. Example:

The path like:

/{foo}{/bar}{/baz}

Unfolds into 3 possible paths:

/{foo}
/{foo}/{bar}
/{foo}/{bar}/{baz}

but NOT

/{foo}/{baz}

so in our fork of swagger, the optional parameters are hierarchical - you can only specify the next parameter if you did specify the previous.

Let's get ourselves spec-compliant :)

+1 from me. For individual services, it would be fairly easy to do, RESTBase though is another be beast, however going spec compliant and swagger 3 will allow us to simplify some code quite a bit.

Pchelolo claimed this task.

I think we have a consensus to go spec-compliant here. I've created a task for services and a special task for RESTBase, as it's a different beast. I'm resolving this ticket.