Page MenuHomePhabricator

REST: API modules can be suppressed/opt-out of spec generation
Open, HighPublic8 Estimated Story Points

Description

Description

We are implementing allow-list based access for the sitemap endpoints. Instead of being publicly available, only trusted bots will be able to access them. Because of the change in access model, it does not make sense to surface the endpoint documentation for other users within the sandbox. While this is the first instance of such a requirement, there are future use cases related to API audiences where spec suppression may make sense, such as in the case of private or other limited access modules.

Conditions of acceptance

NOTE: This is blocked by T409516: Create Site API Module
  • API modules can "opt-out" of spec generation.
    • If an API module is opted out, an OpenAPI spec is not generated.
    • If a module is opted out, it is not listed in the Discovery endpoint, as it has no spec to be discovered.
  • This can be configured through a flag in the module definition file.
    • Specs are opted-in for spec generation by default, including if the flag is not specified.
  • Test this feature using the Site API Module (T409516: Create Site API Module)
    • Update the Site API Module to be opted out of spec generation.
    • Verify that the Sitemap endpoints are no longer visible in the spec discovery response.

Event Timeline

Note from review: We should consider testing. When running tests, there should be an option to bypass the suppression flag, so that a spec is still generated and can be used for testing purposes.

BPirkle renamed this task from API modules can be suppressed/opt-out of spec generation to REST: API modules can be suppressed/opt-out of spec generation.Nov 12 2025, 5:55 PM
BPirkle added a project: MediaWiki-REST-API.

In T392149: [SPIKE] Research pattern for implementing 'beta' modules, specifically in this comment, we decided against adding flags to the module definition file to control visibility, on the grounds that it was a job for configuration and not code. I did not record our entire thought process in that comment, just the conclusion.

If we've rethought that and come to a different conclusion, that's fine. I'm bringing it up just to be sure, though.

Adding @daniel as a subscriber, because he was part of the previous discussion. He's working on other things now, though, so may or may not have availability to participate in this conversation.

When you say config, do you mean something set in the overall MediaWiki config? My one hesitation with that is it distributes the logic for API module owners. As someone registering an API, I would want the settings to be as self-contained and discoverable as possible. I do see the argument for keeping config for displaying full module types (like beta/experimental) in the global config, but it seems a little strange to separate it out for a module preference that may be set by the module creator.

Y'all are the experts here, though. I'll go back to the peanut gallery :)

When you say config, do you mean something set in the overall MediaWiki config? My one hesitation with that is it distributes the logic for API module owners. As someone registering an API, I would want the settings to be as self-contained and discoverable as possible. I do see the argument for keeping config for displaying full module types (like beta/experimental) in the global config, but it seems a little strange to separate it out for a module preference that may be set by the module creator.

Correct. Here's what I said about that same thing in T395719: [BLOCKED] REST: Beta Modules - automatic discovery with config override:

Activating core modules via config is inconvenient and requires either deploy privileges (and the ability/confidence to use it), or scheduling for a deploy window. We've allowed activating extension endpoints/modules for some time without a config change, so there seems to be no process/policy/procedure reason to require it - it is just a consequence of our initial implementation. Also, the current system requires core developers who wish to both activate their module and publish it to the REST Sandbox to modify two different config variables of different format. And extension developers have to do a config deploy to publish their modules in the REST Sandbox, which seems inconsistent and unnecessary - why should providing documentation be gated through a config deploy, when making the endpoint available in the first place is not?

I think we agree. However, we were trending a different direction near the end of the beta module hypothesis, so I wanted to check.

My personal preference, without thinking about implementation details, is that module developers have the ability to declare their modules as enabled/disabled or published/unpublished in the module definition file.

The one thing that doesn't get us is the ability to enable/disable or publish/unpublish modules on a per-wiki basis. That will have to be in config, because MW Core, by design, has no knowledge of individual wikis. So for a comprehensive system, that bit has to be in config.

This takes us right back to where we left off on the previous hypothesis. Basically, we need the thing we started to make, but never finished. At least, a simpler version of it.

I'm really tempted to pick that back up and see if, now that some time has passed and I can see it with fresh eyes, if maybe I can knock out the more comprehensive solution pretty quickly. I could timebox that to, say, two work days. If I'm not nearing the end of the path by then, I can take whatever I've learned and do just the minimal bits needed to add the suppression capability. The danger of just doing part of it is that I inadvertently code us into a corner that makes the larger solution more difficult later. By spending the timeboxed period beforehand, even if I'm not successful, maybe I can at least avoid that.

Assuming we go that route, I've gone back through and reminded myself why all this is annoying to implement:

  • we currently have two related config variables:
    • RestSandboxSpecs: this is keyed by something this similar but not identical to the module file name / module id. For example, we have specs.v0, which is a lot like the specs.v0.json module definition filename, and the specs/v0 module id. However, we also have external specs (ex. wmf-restbase) that follow a different pattern. So we're left with something that feels like it should be useful, but really isn't.
    • RestAPIAdditionalRouteFiles: this lists the modules that are enabled, by module file name. Which is great, except that we have to actually crack open and parse the file to see if it has a flag indicating it is unpublished. We have no other reason to parse all that json in some of the places we need that info, so we're adding a (probably negligible but still irritating) performance cost. The responsible thing to do is parse, but cache. Which adds more code and complexity.

We could instead do module discovery in a different way (we explored a few options in the previous hypothesis). But then we're writing a much bigger change than is really necessary for module suppression.

I'll sort all that out and come up with something or other, even if it is just the minimal solution. I'm just listing some of the challenges here for ... I don't know ... sympathy, I guess? ;-)

Notes from Estimation:

  • Should we consider coming from the other direction? Everything is opted in for spec generation, but have specific flags for where it should appear (Public/Sandbox, Discovery).
  • There will be cases where we want specs generated and discoverable but not visible in the sandbox, others where we want them to appear in neither the sandbox nor discovery endpoint.
    • Discoverable but not present in the Sandbox is already supported -- APIs have to be registered for the Sandbox separately.
  • There may also be needs where we want different behavior on different wikis; therefore, some additional config as well.
  • Possible approach for using module types instead of flags:
    • Public --> Discoverable + Sandbox
    • Internal --> Discoverable, not in sandbox
    • Private --> Neither discoverable nor in sandbox
    • Beta
  • Module up vs generator down -- Spec generation currently happens in the handler. We are exploring possibilities for refactoring spec generation as a separate process. Module definition process is probably the right place for this, to keep settings self-contained?
  • How we present the options to developers and how the code works are separate concerns.
HCoplin-WMF triaged this task as High priority.
HCoplin-WMF set the point value for this task to 8.

The "module type" idea mentioned above has similarities with the Audience Designation concept we'd talked about some time ago. See T366567: REST: introduce audience designations (proposal) and T365752: REST: Introduce support for private modules.

That approach include the audience designation in the path, so that it was visible to layers other than MediaWiki. The designations there were a little different than we're (so far) talking about in this task. That doesn't mean either is wrong, or that we need to follow the previous conversation. (There was significant lack of alignment on that proposal, which is a big part of why it wasn't implemented.) But we should consider what it said, in case it can guide/inform our current thinking.

From the google doc associated with the audience designation idea (summaries mine, stealing some phrases from the doc):

  • public: default designation. Not extra restrictions on the module (endpoints can, of course, do things like require specific authz)
  • beta: subject to change/removal without notice (we dealt with this differently in T395713: REST: Beta Modules - support beta suffix in module ids and versions)
  • internal: module is implemented by the same entity that implements callers (ex. endpoint for WMF mobile apps). Not intended to be called by third parties, such calls may violate terms of service. Not necessarily restricted from being called by third parties at a technical level. Version numbers are not required for these modules.
  • private: can only be called by trusted code inside the wiki operator's network.
  • app: similar to "internal", but version numbers are required. Intended for use by WMF mobile apps, which may have specific caching needs, may want their requires routed to a specific cluster, and which may have a long tail of installed versions extending over years
  • enterprise: separates Enterprise requires for routing and authentication purposes

The module types above use some of these same names, but to mean different things. Because the audience designations were never implemented, using the proposed module type names would not cause technical problems of any kind. It might, however, be confusing to people already familiar with the previous proposal and therefore might need clarification in discussions.

Brainstorming on naming a little.

I like the "module type" idea, Mostly because I dislike a proliferation of flags, But I don't like the word "type", which feels too generic.

If we don't use "Type", then we'll need a name for this in the module definition file. The purpose is to control what information potential callers can see about the module. With that in mind, here are some alternative names. I kind of hate some of them, but I'm including them anyway in case they inspire better ideas from anyone:

  • Publish/Publishing/Published
  • Listing
  • Catalog
  • Discovery
  • Documentation
  • Visibility

As for the value of that field, the public/internal/private suggestions in the comment above aren't bad. We could also do something like Sandbox / Spec / Hidden.

For the time being, we should decouple the path considerations from the modules as much as possible. Eventually we will take the audience approach (perhaps not with the specific proposed implementation, but definitely in spirit) with pathing and access, but for now, let's focus on what we're doing with the modules and specs specifically.

Based on the proposals above (and not really having anything better come to mind), "Module Visibility" seems like the best option. Using that as a working name, these are the specific permutations for visibility that we would expect, along with some proposed names for the different levels of visibility:

  • Published: Discoverable + in Sandbox
  • Discoverable: Discoverable, not in sandbox
  • Hidden: Neither discoverable nor in sandbox

I agree with your point that it might make sense to avoid using the same names as what we would consider audiences, just to avoid confusion and overloaded terms. I would also venture to say that each of the audiences would likely inherit a default "visibility" setting too, though.

For these proposals, published and discoverable specs would be subjected to linting to ensure they meet our functional and stylistic standards, as they are intended to be accessed by multiple types of consumers. I also think it's fair to say that "public" APIs would inherit the published visibility by default, just to acknowledge how it might fit with audiences more broadly.

When it comes to hidden, is it the same as being functionally suppressed from generation? I increasingly think not. At minimum, there is likely still a need to generate it on the fly for testing purposes, so some kind of flag needs to be allowed there. Thinking about sitemap specifically, there might also be value in having the option to request a spec if you already know it exists (perhaps leveraging the same allowlist as the endpoint itself), even if it's not published in the discovery doc. That assertion aligns with my current thinking for "internal" or "private" audiences as well, where it's likely the owning team would still want/need a spec somewhere. With that in mind, I think the biggest open questions are how we want to allow spec generation, and what that means for spec quality. Below are some options for the level of spec quality we would want to see for hidden modules -- I would like your thoughts on it, and tagged my [recommended] option.

Option 1: Require the same standards for hidden specs as published. [RECOMMENDED]

Pros:

  • It is production ready, if/when people want to turn it on. This would be specifically useful in the case of new APIs and beta modules, were they are likely built in a hidden state, then published when ready.
  • It would be "the right thing to do" from a longevity perspective. In other words, even internal APIs would benefit from having a robust API description when someone is looking at it years later and might not know why it was created in the first place.

Cons:

  • There are some APIs that will never be exposed externally, and it's a lot of work to make a fully compliant spec.

Option 2: Create a reduced linting ruleset that allows for more flexibility.

Pros:

  • Might offer a "good enough" approach for linting, so that the spec is functionally compliant and there is sufficient context for future developers without it being too burdensome.
  • Ensures it is functionally compliant when generating for testing purposes.

Cons:

Option 3: Do not enforce standards in general

Pros:

  • Rapid development, especially in cases where an API is truly private, experimental, or short lived

Cons:

  • Becomes burdensome if/when it's time to change the visibility settings for the module
  • Could result in non-functional/poorly structured specs that do not work for testing purposes

My main reasoning for recommending Option 1 for now is that I do truly believe in the universal value of API definitions, even for internal capabilities. The more extensions and services we have publishing them, the better. We also have limited adoption of hidden spec visibility (for now), where it might be easier to keep things more restrictive and consistent coming out of the gate, while still preserving the option for customization (Option 2) later. What do you think?

I'm good with all that.

When it comes to hidden, is it the same as being functionally suppressed from generation? I increasingly think not. At minimum, there is likely still a need to generate it on the fly for testing purposes, so some kind of flag needs to be allowed there.

Agreed. It would also be possible to later add a fourth possibility, maybe named suppressed, if we ever need to actually prohibit generation. (To me, hidden sounds like it is there, just hard to find, while suppressed sounds like it is prevented from even happening. Others may have different opinions, and we don't need to decide on a name for the fourth option right now - we may find we never need it.)

Even suppressed specs would be desirable for testing, and would need a workaround so they could be generated in a testing context.

I'll note that the spec url pattern is pretty darn obvious, and the code is open source, so people WILL find hidden specs. Well-behaving bots who start at /discovery won't, though, and hiding it makes our intentions clear. If people want to work around that and we then break things for them without warning, I won't feel very bad.

Below are some options for the level of spec quality we would want to see for hidden modules

I'm also on board with requiring full quality on even hidden specs (or suppressed, if we ever do that). Figuring out which rulesets matter for testing in a "limited quality" sense sounds like a big pain. As you say, if it ever were necessary we could relax things later. Hopefully we won't have to.

This does mean I need to come up with strings and schemas for the site module I'm currently typing up for T409516: Create Site API Module. So for anyone reading this comment later who's annoyed at my support for full quality requirements, I had to do it for my module too. :)

Fun fact, we actually already have a ticket for the sitemap spec clean up too: https://phabricator.wikimedia.org/T402691

Change #1208461 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] DNM: REST: allow controlling module spec visibility

https://gerrit.wikimedia.org/r/1208461

Change #1208461 merged by jenkins-bot:

[mediawiki/core@master] REST: add ModuleManager for managing information about REST modules

https://gerrit.wikimedia.org/r/1208461

Summary of my understanding after rereading the above, and some synchronous discussion:

  • REST API module definition files, such as site.v1.json, support a "visibility" field.
  • Possible values of this field are "published", "discoverable", and "hidden".
  • The default is "published". If no visibility value is specified, then the system behaves as if "published" was specified.
  • A value of "discoverable" means that the module is available via the discovery endpoint of the specs module, but is not shown in the REST Sandbox.
  • A value of "hidden" means that they module is not shown in the REST Sandbox, and is also not available via the discovery endpoint of the specs module.
  • Spec generation is available even for "hidden" modules via the "module" endpoint of the specs module.
  • The Site module, specified in the site.v1.json module file, has a "visibility" value of "hidden"
  • Any of this can be overridden in config. In other words, even if a module definition file specifies "hidden", specifying it in $wgRestSandboxSpecs will cause it to behave as if it were "published"

Change #1234543 had a related patch set uploaded (by BPirkle; author: BPirkle):

[mediawiki/core@master] REST: allow suppressing spec discovery/publication in module files

https://gerrit.wikimedia.org/r/1234543