⚓ T76883 Flatten internal URL structure and move away from a layered overloading model

Status	Assigned	Task
Resolved	• mobrovac	T76883 Flatten internal URL structure and move away from a layered overloading model
Resolved	• mobrovac	T85573 Tree/DAG-based path router improvements
Resolved	• Pchelolo	T86737 Generalize request templating for service requests
Resolved	• GWicke	T86875 promisify tree building / module loading
Resolved	• GWicke	T86977 Implement resource requests

Event Timeline

• Jdouglas created this task.Dec 5 2014, 7:32 PM

• Jdouglas raised the priority of this task from to Needs Triage.

• Jdouglas triaged this task as Medium priority.

• Jdouglas updated the task description. (Show Details)

• Jdouglas added a project: RESTBase.

• Jdouglas changed Security from none to None.

• Jdouglas edited subscribers, added: • GWicke; removed: Aklapper.

• Jdouglas subscribed.

• Jdouglas updated the task description. (Show Details)Dec 5 2014, 11:16 PM

• GWicke renamed this task from Make intentional low-level paths more explicit to Flatten internal URL structure and move away from a layered overloading model.Dec 6 2014, 12:24 AM

• GWicke updated the task description. (Show Details)

• GWicke added a project: RESTBase-architecture.

• GWicke updated the task description. (Show Details)

• GWicke updated the task description. (Show Details)Dec 6 2014, 12:27 AM

Hardikj subscribed.Dec 6 2014, 12:33 AM

• GWicke updated the task description. (Show Details)Dec 6 2014, 4:19 AM

• GWicke mentioned this in T78219: Phabricator search does not reliably match assigned projects.Dec 11 2014, 1:03 AM

• GWicke updated the task description. (Show Details)Dec 11 2014, 2:42 AM

• GWicke updated the task description. (Show Details)Dec 11 2014, 7:59 AM

• GWicke updated the task description. (Show Details)

I started to throw some ideas into a branch:

[a module code structure defining the /_sys/ namespace](https://github.com/gwicke/restbase/blob/modules/doc/Implementation.md#code-structure)
an example spec handler

Generally, I feel that the logic we have in the yaml test handler is already too much. The logical next step would be to add more functionality there to handle more complex scenarios, which could lead to some half-assed yaml-based programming language. I think it'll be better to leave code to code & focus the spec handlers on declarative support for the typical tasks ('80%'). With a module structure we should be able to add code in the /_sys/ namespace, which can then be used by the spec handlers where necessary for more complex things.

This is all not very worked out yet, but I think the direction is promising. The biggest questions I have right now are:

How can we make sure that some handling always happens during the access to some data? Example: revision deletion/suppression vs. direct access to tables and KV buckets; should we enforce 'can only accessed through handler X'?
Will the mental model of the request flow be manageable? What will the major pieces be?

I think it'll be useful to work through the typical use cases (especially page content and perhaps those for buckets) to see where the issues are.

• GWicke mentioned this in T1228: RESTbase deployment.Dec 19 2014, 7:28 PM

• mobrovac subscribed.Dec 24 2014, 6:29 PM

Some progress:

Moved versioning below the domain (/{domain}/v1/). Rationale: APIs are defined per domain, and there might be multiple APIs per domain.
new tree-based router: https://github.com/gwicke/swagger-router
restbase WIP: https://github.com/gwicke/restbase/compare/modules

There is a new global config structure in config.example.yaml. I separated interfaces (swagger specs) from modules (implementations). Modules can be distributed as npm modules (or we can just have a folder full of them for now). The operationId property in the swagger spec can be used to match swagger routes to module exports. There are per-module options and resources. Options are passed to the module constructor. Resources are requests that are done after all modules are registered in the global namespace (get / put). A body can be specified, in which case it is compared with the returned body & sent if that doesn't match.

Mounting of interfaces vs. knowledge of global (absolute) paths

The tree-based router supports mounting of specs somewhere further down the tree, for example at /en.wikipedia.org/v1/page. This lets us use the same spec for different domains, or under different per-domain API versions (v2 for example).

Since the specs themselves don't really know how they will be mounted, it becomes difficult to use absolute paths to access other APIs from internal specs and modules. Even the absolute path to a /sys/ hierarchy is not necessarily known unless we make some assumptions about the structure of the tree.

Since we'd like to avoid direct external access to the sys namespace, an option would be to use a separate 'protocol' namespace, as in sys://parsoid/... The question then becomes how parameters like the domain are injected or handled. We kind of assume that we have a domain, so it might not be worth hiding it from the path.

The other option is to just define the canonical system namespace to live at /{domain}/sys/, and to always require the domain prefix. Versioning of the sys namespace can happen at the module level, so either /{domain}/sys/parsoid/v1/, or /{domain}/sys/parsoid2/.

Where to handle rev content read / write

One thing I was grappling with while fleshing out the content interface is how to organize the interaction between front-end handlers and system modules, especially for more complex cases like Parsoid.

In rev_content

needs to know about all backend buckets
can pull uniform content from services, return part
harder to understand for service author
harder to add storage in second step

In service: parsoid, simpleservice

can (optionally) configure backend storage for each service in simpleservice module
service module knows how to read / save its own content
easier mental model for service authors
can do more complex read processing if needed
page_revision service roles:
- get latest revision for title (from api or local table)
- get timeuuid range & metadata for revision
- list pages & revisions (can expose this directly from front-end)
bock service roles:
- blocks for url / revision

• GWicke updated the task description. (Show Details)Dec 29 2014, 10:06 PM

• GWicke updated the task description. (Show Details)

James, Marko & I discussed this earlier IRL.

Decisions:

Standardize on /{domain}/ prefix and /{domain}/sys/for internal resources.
Perform access to content through service wrappers. In particular, the parsoid module at /{domain}/sys/parsoid/ will own storage for its content types, and transparently call out to the actual parsoid service. It will in turn use page/revision metadata from /{domain}/sys/page_revision/.

Action items:

James continues work on T75955
Marko tackles swagger-router improvements:
- sharing of sub-trees (convert to a DAG)
- URI object / templating
Gabriel continues work on T76986 and T78194, and starts work on loading of interfaces / modules using the new design

Basic spec loading with sharing has now landed on [the modules branch](https://github.com/wikimedia/restbase/tree/modules) in ae15c7e8a

Next steps:

Flesh out sys interfaces - in progress
Add actual module loading and symbol matching: partly done, async work in T86875
Convert some minimal set of modules to the new-style API - in progress
Add real sub-request ('service') handling: T86737
Hook the entire thing up in restbase.js - in progress
Protect /sys - done
Add routing tests - only some right now; will get lots of implicit testing with RB integration tests
Get tests to pass

In today's team meeting we determined that we'll make a decision on whether we are going ahead with this refactor before the deploy by Friday. A major reason for this refactor is to clean up the code structure in preparation for third-party extensions and new entry points. It also properly addresses several release-related issues in a way that we feel is solid in the longer term.

Plan B is to do something simpler for the release, and continue with the refactor afterwards.

• GWicke assigned this task to • mobrovac.Jan 14 2015, 2:28 AM

• GWicke raised the priority of this task from Medium to High.

• GWicke closed subtask T86875: promisify tree building / module loading as Resolved.Jan 16 2015, 4:50 PM

https://github.com/wikimedia/restbase/pull/118 has now landed, which resolves this task.

RESTBase is now fully spec-driven, auto-creates storage tables & supports spec-derived listings.

• GWicke closed subtask T86977: Implement resource requests as Resolved.Jan 24 2015, 10:09 PM

• GWicke closed subtask T85573: Tree/DAG-based path router improvements as Resolved.

Diffusion mentioned this in rGRESeeaf9d1bf070: T76883: Check for duplicate module symbols only once.Apr 8 2015, 4:34 PM

Diffusion mentioned this in rGRES127215490ddf: T76883: Minor: Add missing semicolon to Router.route().

Diffusion mentioned this in rGRESf5c38f9c88f3: T76883: Minor: Check we are under /{domain}/sys for x-restbase-paths.

Diffusion mentioned this in rGRES559b143a2b54: T76883: Add a dummy new-style module.