Page MenuHomePhabricator

[5.2.8 Epic] Include Audience designations on modules
Open, Needs TriagePublic

Description

Background

Although we have documented stability policies that indicate a variety of use cases, by default, Wikimedia endpoints are public. There are no technical limitations or access control in cases where endpoints are intended for internal usage; similarly, there are not well supported pathways for defining truly private endpoints, such as those that may pass PII or other sensitive information. As a result, many external systems end up utilizing endpoints marked as internal, which causes challenges when internal flexibility is desired or content should be controlled.

The scope of this work is to explore what different privacy, stability, and access control models might look like from a module definition perspective. Module definitions specifically control things like how and where a generated API spec is published, as well as expectations for stability and versioning.

Conditions of acceptance

  • Determine which existing extensions may benefit from 'internal' and 'private' module designations --> Halley facilitate this
  • Define the technical boundaries of 'private' and 'internal' modules
  • Document assumptions for stability, access, and usage of the different audience types --> Halley will help with this
  • Create at least one proof of concept module that demonstrate 'private' and 'internal' constraints
  • Document how to utilize 'private' and 'internal' designations for API Authors

Implementation details

This work exclusively applies to module creation. We will not yet be surfacing audiences through the route themselves, nor enforcing access control related to audience designation at this time.

Related documents:

Event Timeline

Various conversations have occurred related to this task, and thinking has changed somewhat. This comment is intended to capture the current ideas. The task description has not yet been adjusted, but should be once any further refining discussions are complete (and subtasks should be created).

Audience designations will appear as version suffixes. See T395713: REST: Beta Modules - support beta suffix in module ids and versions for related discussion on the general idea suffixes. Initial possibilities will be:

  • no designation supplied (the most common case)
    • endpoints are publicly available
    • changes and removal are handled through predictable versioning and deprecation processes
    • generated OpenAPI spec is publicly available
    • the module is listed in the REST Sandbox
    • a current example on production is specs.v0 aka the Specs API
  • -public
    • behaves exactly the same as if no designation were given
    • this might be unnecessary and never used - any such modules are likely to just omit the suffix - but it is trivial to implement and might be a useful disambiguation if a particular piece of functionality has REST modules that use multiple designations
  • -beta
    • endpoints are publicly available
    • no stability guarantees apply. Changes or removal may occur at any time
    • generated OpenAPI spec is publicly available
    • the module is visually distinguished in the REST Sandbox. Opt-in is required to view.
  • -internal
    • endpoints are publicly available
    • stability is based on internal needs. External callers should make no assumptions about stability.
    • generated OpenAPI spec is publicly available
    • the module is visually distinguished in the REST Sandbox. Opt-in is required to view.

(In all cases, for WMF wikis, public availability is still subject to our normal API terms, user agent policy, rate limiting, and so forth. Some endpoints may require authentication, certain permissions, etc.)

Additional designations have been discussed (ex. -restricted, -legacy) but will not be initially supported (and may never be).

The MW REST framework will support the following new functionality related to audience designations. From most to least restrictive:

  • OpenAPI spec is fully suppressed from public access (not listed in the /discovery endpoint, not available at any public url).
    • the spec should still be available within tests, so that things like response schemas can be used to validate endpoint behavior
    • because our code is open source, callers can and will still become aware of the existence of all modules. Spec suppression is intended to communicate that a module should not be publicly used, not to prevent public knowledge of its existence.
  • hidden OpenAPI spec url (OpenAPI spec is not listed in the /discovery endpoint, but is still technically available)
    • callers can and will use the known url pattern to load the spec. As with spec suppression, hiding the spec url from /discovery communicates intention and should not be considered to provide any sort of security.
  • hidden documentation (the module is not listed in the REST Sandbox)
  • opt-in documentation (the module is only seen in the REST Sandbox if the user proactively takes an action, probably via a checkbox, indicating that they want to see such modules)

Not all of this functionality will initially be used, but it is anticipated that all of it will eventually be needed. As long as implementing it now does not require undue effort, doing it now allows us to consider how best to approach all this while we're already modifying those areas of the code.

Because audience designations are new and we're still learning exactly how they'll work, we should avoid tightly tying any particular designation to any particular behavior. A layer in the code should translate the behaviors of designations to functionalities. Audience designations should be mapped to the various types of functionality in a way that (for example) lets us easily make adjustments. For example, if it is determined that specs for -internal modules should be completely hidden from the REST Sandbox, the associated code change should be minimal. This mapping should be hard-coded and not exposed via configuration.

Concurrent with introducing audience designations, we will also change the way REST API config works with respect to modules. Under the new approach, modules will be enabled and (if their audience designation allows it) listed in the REST Sandbox with no config changes required. A new config variable will allow overriding this if desired for particular modules. It will be possible to reference a module that does not yet exist in this new config variable. Modules that need to be "born disabled" or "born hidden" can first deploy the config change before deploying the module. A smooth transition path should ensure that introducing the new system does not change current on-wiki behavior. The existing config variables should be deprecated (and eventually removed, per MW deprecation policy).

Details of the new config approach will be posted in a separate comment.

It is convenient for discussion to put names to behaviors. Borrowing some naming from T409517: REST: API modules can be suppressed/opt-out of spec generation, where we had talked about implementing similar functionality in a different way, and introducing a term for things that weren't discussed at that time, the functionality supported in core, from most to least restrictive will be:

  • disabled (endpoints are not available to be called, spec is not viewable or discoverable, no sandbox entry)
  • hidden (endpoints are callable but spec is not accessible or discoverable, no sandbox entry)
  • discoverable (endpoints are callable, spec is listed in /discovery, no sandbox entry)
  • toggleable: (endpoints are callable, spec is listed in /discovery, opt-in to view on sandbox)
  • published (endpoints are callable, spec is listed in /discovery, sandbox entry)

I just made up "toggleable", and I'm not 100% comfortable with it. I also considered "disclosable" (but that felt too easily confused with "discoverable" and also a legalish) and "optional" (but that felt vague). Suggestions welcome.

It may also be that the new thing I'm currently calling "toggleable" doesn't need to be in that same taxonomy. Everything else in the list requires on the previous thing. For example, a module can't be discoverable if it is hidden, because there's no spec to discover. But a module doesn't have to be toggleable to be published. It feels like having a word for this would make it easier to talk about, though.

Regarding config, and how it will change to support audience designations:

MW REST currently uses two config variables to control module enabling and REST Sandbox visibility:

  • RestSandboxSpecs
    • provides the list of OpenAPI specs that are visible in the REST Sandbox
    • keyed by something similar but not identical to the module file name / module id
    • the REST Sandbox is used to display OpenAPI specs other than from MW REST modules (ex. RESTbase), so we must account for this usage
    • we expect to need support for viewing arbitrary OpenAPI specs on a permanent basis. This is not going away.
  • RestAPIAdditionalRouteFiles
    • lists the modules that are enabled, by module file name
    • also applies to "flat" routes (ex. coreRoutes.json) so we have to account for this usage
    • we may eventually be able to deprecate and remove support for flat route files

The recently-introduced ModuleManager class provides access to cached information from module definition files. The cached information includes the moduleId, which contains the audience designation (we'll likely want to add a convenient helper function somewhere to efficiently get a module's audience designation).

REST modules in core can currently be enabled in two ways:

  • adding their module file (or for "flat" routes, their route definition file) to the RestAPIAdditionalRoutesFiles config variable
  • adding the module definition file to the hard-coded constant in the ModuleManager class.

REST modules in extensions are currently enabled automatically once they're referenced in extension.json (with no way to disable them). This happens in ExtensionProcessor::extractRestModuleFiles(), which unconditionally adds any modules files from extension.json into the RestAPIAdditionalRouteFiles config variable.

This was a convenient mechanism when all REST routes where registered via config. Because we're removing that, it makes more sense to accumulate REST modules (by module file name) in an ExtensionRegistry attribute and process them in ModuleManager. See [[ ForeignResourcesDir | ForeignResourcesDir ]] as an example of how this would work, including usage of the attribute in SpecialVersion.php.

The list of OpenAPI specs is provided to the REST Sandbox by ModuleManager::getApiSpecs(), which combines specs from the RestSandboxSpecs config variable and the hard-coded list in ModuleManager. Once the list of all route files is available in ModuleManager per the above changes, then this function (or a helper) can include OpenAPI specs for modules from extenions in the list.

To deal with situations where modules should not be enabled/published (perhaps on a per-wiki basis), we'll introduce a new config variable: RestModuleSettings. (Note: I'm not crazy about that name and welcome suggestions.) This variable should be keyed by moduleId. Exact format is yet to be determined. This should likely allow overrides on a more granular level than the current config variables. We probably want control over the new functionality supported in the framework (enabled,

For transitioning to the new system, the existing variables should override the new one. Some examples of how this would work (enabled = endpoints are available to be called, published = visible in the REST Sandbox):

  • Flat routes from coreRoutes.json:
    • enabled everywhere via inclusion in the hard-coded constant in ModuleManager
    • published in sandbox both via inclusion in the hard-coded constant in ModuleManager, and a redundant entry in RestSandboxSpecs. This constant/entry covers all flat routes, including from coreDevelopmentRoutes.json and from extensions that expose REST endpoints not in modules.
    • available and published on all wikis
    • we could maintain current behavior by doing nothing
    • we should probably remove the redundant RestSandboxSpecs entry, just to avoid confusion
  • Flat routes from coreDevelopementRoutes.json
    • enabled only on labs and testwiki (via RestAPIAdditionalRouteFiles entries in InitialiseSettings.php and InitialiseSettings-labs.php respectively)
    • also enabled in DevelopmentSettings.php
    • this includes only a redirect of / to /specs/v0/discovery
    • we should probably remove this completely from core and config. The module system and -beta designation have replaced it.
    • endpoints herein appear in the sandbox everywhere they're enabled, by virtue of being "flat" routes.
  • content.v1
    • enabled only on labs and testwiki, via RestAPIAdditionalRouteFiles entries
    • also enabled in DevelopmentSettings.php
    • published to the sandbox only on labs
    • making modules published by default would expose this in places we do not want it to be
    • a RestModuleSettings entry should by used to suppress enabling of this in places it should not be enabled
  • specs.v0
    • enabled on all wikis via RestAPIAdditionalRouteFIles entries
    • also enabled in DevelopmentSettings.php
    • published in the sandbox on all wikis via RestSandboxSpecs entries
    • we could maintain current behavior by doing nothing
    • we should probably remove the config entries, to avoid confusion
  • site.v1
    • enabled on all wikis via RestAPIAdditionalRouteFIles entries
    • also enabled in DevelopmentSettings.php
    • not published in the sandbox anywhere
    • a RestModuleSettings entry should by used to suppress publishing of this in places where it should not be published
  • readinglists.v0
    • enabled on all wikis where ReadingLists is enabled (just sul wikis, but that's a bunch of 'em), by virtue of being a REST module in an extension
    • not published in the sandbox anywhere. I don't remember why not. Maybe we just never did it? The spec is discoverable.
    • a RestModuleSettings entry could by used to suppress enabling of this in places where it should not be enabled, while still keeping it unpublished even where enabled
    • we could consider allowing it to be published rather than just discoverable

None of this requires any changes to existing config. The new config variable should not give any errors or warnings if it refers to a module that is not found in the code. This will allow us, as well as future developers with special needs, to first deploy the RestModuleSettings config change, then deploy the module change.