Page MenuHomePhabricator

Validate swagger definitions and test them against a set of responses from the services
Open, LowPublic

Description

What

The swagger spec seems to contain invalid models and references and is out of date with some of the responses.

We should have a mechanism to validate that the spec is correct and consistent (that runs on CI per commit), and additionally exercise it against a subset of the service responses (on the tests on CI).

AC

  • The swagger spec exposed by the services is validated and consistent
  • It is impossible to commit an inconsistent swagger definition (CI checked)
  • A subset of responses from the service are type checked against the tests to ensure that the definitions are consistent with the service responses

Event Timeline

Jhernandez created this task.

cc/ @bearND @Mholloway Let me know if this makes sense.

I was reviewing the swagger spec and I tried to validate it and I saw that there are problems with the spec (missing models, and some other warnings).

I also noticed that some services return things that are not in the schemas, for example the summary service returns content_urls and a few other things that don't exist on the specification.

The thinking here is that the schema should be the canonical representation of what a service can respond, and we should be strict about keeping it valid, and trying to ensure the services respond according to it (hence the suggestion for type checking responses in some tests against it).

Let me know if it makes sense. I'm making this an epic and we could tackle it per service or project as we get time.

@Jhernandez Here's the output I see when I run Swagger spec validation with npm run swagger:

wmf1256:mobileapps mholloway$ npm run swagger

> service-mobileapp-node@0.3.0 swagger /Users/mholloway/mobileapps
> ./node_modules/swagger-tools/bin/swagger-tools validate spec.yaml


API Errors:

  #/parameters/tid: Not a valid parameter definition
    #/parameters/tid: Missing required property: schema
    #/parameters/tid: Not a valid parameter definition
      #/in: No enum match for: path
      #/in: No enum match for: path
      #/in: No enum match for: path
      #/required: No enum match for: false
  #/parameters/revision: Not a valid parameter definition
    #/parameters/revision: Missing required property: schema
    #/parameters/revision: Not a valid parameter definition
      #/in: No enum match for: path
      #/in: No enum match for: path
      #/in: No enum match for: path
      #/required: No enum match for: false

2 errors and 0 warnings

The output is a bit convoluted but the issue is that we treat the revision and tid path parameters (on many /page/ endpoints) as optional, but Swagger 2.0 actually does not permit optional path params. (See also discussion on the patch here.) IIRC this is why we use a fork of swagger-ui rather than pulling directly from upstream: we want optional path parameters.

We should have a mechanism to validate that the spec is correct and consistent (that runs on CI per commit), and additionally exercise it against a subset of the service responses (on the tests on CI).

In RESTBase we test some of the specs for conformance to the response schema definitions https://github.com/wikimedia/restbase/blob/master/test/features/schema_tests.js I believe MCS could do the same.

Also, we have x-amples testing in RESTBase, I believe MCS should have that as well. If it doesn't, we could convert this code https://github.com/wikimedia/restbase/blob/master/test/features/specification/monitoring.js into a library and use it in both RESTBase and MCS. Unfortunatly the checker we use in production is written in Python, so we've reimplemented it in node for easier local test running. We could although create a specialized CI job for services to run the production checker

Also, there's an idea to update service-checker to validate response schemas T110240, if we do that we could merge these 2 types of testing.

In addition to npm run swagger that @Mholloway mentioned earlier we also have test/features/app/spec.js which does some validation.

@Jhernandez Is there anything missing in test/features/app/spec.js?

Cool, I didn't know we had those in place! Great to see.

I understand the optional path params, so that's fine.

What prompted me to write this was because I was experimenting with generating graphql types from our swagger specs (using swagger-to-graphql) and I got into errors regarding some of the definitions. For example:

{
  "name": "SyntaxError",
  "message": "Error resolving $ref pointer \"/Users/jhernandez/dev/wikipedia-graphql/spec.json#/definitions/metadata\".
Token \"metadata\" does not exist.",
  "stack": "SyntaxError: Error resolving $ref pointer \"/Users/jhernandez/dev/wikipedia-graphql/spec.json#/definitions/metadata\".
Token \"metadata\" does not exist.

Then looking into https://en.wikipedia.org/api/rest_v1/?spec manually it is true I didn't see the definition, so I assumed we weren't validating the schema with the definitions or type checking them against responses.

After what you all mentioned, it seems like there is something happening that is causing this error to not surface?

I think it would be good if we ensure that everything is consistent in any case 😄

So we can use this epic to open subtasks for specific improvements, or bugs, or aligning mcs with restbase if it isn't (seems like it is now?)

@Jhernandez I just tried running this but am getting different results. I'm using the CLI like this:

$ npm i -g swagger-to-graphql
$ npm i -g graphql@0.13.0                
$ swagger-to-graphql --swagger=spec.yaml
Error: Expected media_item_license! to be a GraphQL nullable type.

After making media_item/license not required anymore it now complains about a different entry (metadata_language_links_items_titles!), and so on. Not sure why it doesn't like required properties.

This is what I'm doing @bearND

~ => mkdir /tmp/test

~ => cd /tmp/test/

test => npm install graphql swagger-to-graphql
npm WARN saveError ENOENT: no such file or directory, open '/private/tmp/test/package.json'
npm WARN enoent ENOENT: no such file or directory, open '/private/tmp/test/package.json'
npm WARN test No description
npm WARN test No repository field.
npm WARN test No README data
npm WARN test No license field.

+ swagger-to-graphql@1.4.0
+ graphql@0.13.2
added 149 packages from 105 contributors and audited 186 packages in 4.65s
found 0 vulnerabilities

test => curl -s https://en.wikipedia.org/api/rest_v1/?spec > spec.json

test => swagger-to-graphql --swagger=spec.json
{
  "name": "SyntaxError",
  "message": "Error resolving $ref pointer \"/private/tmp/test/spec.json#/definitions/metadata\".
Token \"metadata\" does not exist.",
  "stack": "SyntaxError: Error resolving $ref pointer \"/private/tmp/test/spec.json#/definitions/metadata\".
Token \"metadata\" does not exist.
    at Pointer.resolve (/private/tmp/test/node_modules/json-schema-ref-parser/lib/pointer.js:73:17)
    at $Ref.resolve (/private/tmp/test/node_modules/json-schema-ref-parser/lib/ref.js:82:18)
    at $Refs._resolve (/private/tmp/test/node_modules/json-schema-ref-parser/lib/refs.js:157:15)
    at inventory$Ref (/private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:99:23)
    at /private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:69:11
    at Array.forEach (<anonymous>)
    at crawl (/private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:63:12)
    at /private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:72:11
    at Array.forEach (<anonymous>)
    at crawl (/private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:63:12)
    at /private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:72:11
    at Array.forEach (<anonymous>)
    at crawl (/private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:63:12)
    at /private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:72:11
    at Array.forEach (<anonymous>)
    at crawl (/private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:63:12)
    at /private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:72:11
    at Array.forEach (<anonymous>)
    at crawl (/private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:63:12)
    at /private/tmp/test/node_modules/json-schema-ref-parser/lib/bundle.js:72:11
    at Array.forEach (<anonymous>)"
}
(node:11351) [DEP0079] DeprecationWarning: Custom inspection function on Objects via .inspect() is deprecated

Ah, I see you're getting the spec from RESTBase and in JSON format. I was trying the one from MCS locally (the YAML file). That explains the difference. (It's also better to not install the tool globally.)

Then I tried this with the local RB spec files. Turns out some definitions were missing for metadata.

$ swagger-to-graphql --swagger=v1/metadata.yaml
{
  "name": "SyntaxError",
  "message": "Error resolving $ref pointer \"/Users/besi/de/wmf/restbase/v1/metadata.yaml#/definitions/metadata\". 
Token \"metadata\" does not exist.",
  "stack": "SyntaxError: Error resolving $ref pointer \"/Users/besi/de/wmf/restbase/v1/metadata.yaml#/definitions/metadata\". 
Token \"metadata\" does not exist.

which is the same error you got. That error should be fixed once my PR gets deployed.

Then I ran into other missing definitions:

$ swagger-to-graphql --swagger=v1/metadata.yaml
{
  "name": "SyntaxError",
  "message": "Error resolving $ref pointer \"/Users/besi/de/wmf/restbase/v1/metadata.yaml#/definitions/problem\". 
Token \"problem\" does not exist.",

To work around that one locally we need to merge the yaml file with the common_schemas.yaml file. In this example I used merge-yaml-cli. While it seems to add extra newlines, it's good enough for what we need here:

$ merge-yaml -i v1/metadata.yaml v1/common_schemas.yaml -o tmp/metadata.yaml

To help go through all the YAML files in RB quickly I wrote a small shell function and added a tmp folder:

mkdir tmp
function swag {    
  merge-yaml -i v1/$1 v1/common_schemas.yaml -o tmp/$1
  swagger-to-graphql --swagger=tmp/$1
}

I was able to fix most of the spec YAML files in RESTBase (https://github.com/wikimedia/restbase/pull/1030). After this, there are two files left, for which the tool didn't provide a good enough error message for me to discern easily what's going wrong there:

  • definitions.yaml
  • transform.yaml

There are a few additional yaml files with issues:

  • metadata.yaml: it complains about two of the titles being marked as required. But I think they should be required. Not sure why it complains about these.
  • references.yaml: was introduced by describing nullability for the id property with type: ['string', 'null']. You can work around this by changing this back to type: string, but it doesn't seem correct.

Looks like the JSON output in https://en.wikipedia.org/api/rest_v1/?spec doesn't include any of the definitions.

It does include them all.

Looks like the JSON output in https://en.wikipedia.org/api/rest_v1/?spec doesn't include any of the definitions.

It does include them all.

Oops, I meant to remove that part before I hit submit. Corrected comment above.

@Jhernandez The specs are updated in prod. Unfortunately, I get the same error message as when I run it locally just for the definition.yaml file, which seems misleading to me. It looks like it cannot open the file:

Error: ENOENT: no such file or directory,

Thanks for the improvements @bearND.

I tried again with the same instructions:

mkdir /tmp/test
cd /tmp/test/
npm install graphql swagger-to-graphql
curl -s https://en.wikipedia.org/api/rest_v1/?spec > spec.json
./node_modules/.bin/swagger-to-graphql --swagger=./spec.json

and it is failing now with:

TypeError: Cannot read property 'replace' of undefined
    at replaceOddChars (/private/tmp/test/node_modules/swagger-to-graphql/lib/swagger.js:112:14)
    at getParamDetails (/private/tmp/test/node_modules/swagger-to-graphql/lib/swagger.js:140:14)

Did some debugging with node debugger and found out that it is failing because there is a parameter without a name. If you grep for Section changes to transform in https://en.wikipedia.org/api/rest_v1/?spec you see that the "in": "body" parameter in the /transform/section-changes/to/wikitext/{title}/{revision} doesn't have a name.

I'm not sure if it should have it, if you look at the swagger UI the row for that parameter doesn't have a name and it looks weird, so maybe it should?

Screen Shot 2018-08-06 at 14.48.07.png (574×1 px, 72 KB)

Anyway, if we want to annotate that param we can continue checking if this tool can surface any more errors for us. If we don't, then I think that is all this tool could do for us.

Have a look and chime in with what you think we should do.

I'm not sure if it should have it, if you look at the swagger UI the row for that parameter doesn't have a name and it looks weird, so maybe it should?

According to swagger spec for the body params the name is optional, because, it's the whole request body after all. Feel free to add a name, but I think it's a bug in the tool you'reusing.

👍 We can also patch the JSON file with a name and see if we get anything else interesting out of it.

@Jhernandez I think the easiest way to get useful output from this tool is to clone the Restbase repo and target individual spec files under v1/*.yaml. (combine with the common_schema.yaml, see my comment in T199101#4466668 for command line examples).

FTR all the reading lists post parameters that have "in": "body" have a name, the only one that doesn't is the /transform/section-changes/to/wikitext/{title}/{revision} one.

I patched that name to see what I'd get and then I get some graphQL errors so it seems like that is the end of it.


I'm still interested in seeing how we make the kind of mistakes we found during this task errors in CI so that we can't commit broken type definitions or typos on fields of the schema. I'd appreciate any ideas.

Another issue I've found with /page/summary: https://en.wikipedia.org/api/rest_v1/#!/Page_content/get_page_summary_title

The swagger spec either on the model or example has any mention of a type field. If you look at https://www.mediawiki.org/wiki/Page_Previews/API_Specification#Responses you can see there is a type parameter, and if you actually query the service it is there: 'type": "standard".

Just at a glance, I can see that there are many other fields that are not specified on the spec type definitions, things like tid, revision, content_urls, etc.

We should strive to have a solid spec and API docs, and I think it should be validated and tested better against the actual responses of the services. I'm not sure how that looks like in the end but I think we should explore improving this.

Having APIs with incomplete or inconsistent documentation takes away trust in the docs, so we may as well not have any and get rid of the maintenance effort.

Now that we are using swagger compliant swagger, maybe we could reconsider if it makes sense to have some integration tests with production API queries and validating the responses against the swagger spec (and erroring out when the response doesn't match the spec, or it has extra information)?

Jhernandez lowered the priority of this task from Medium to Low.Sep 4 2019, 6:36 PM