Page MenuHomePhabricator

Investigate converting OAuth to use ContentHandler
Open, Needs TriagePublic

Description

OAuth currently uses a custom table and a bunch of special pages (Special:OAuthListConsumers, Special:OAuthManageConsumers, Special:OAuthConsumerRegistration) to input/store/display OAuth application information. In the wikitech-l discussion @Legoktm recommended using ContentHandler instead of a database. This task is to examine the plausibility and benefits/drawbacks of that.


Currently we store the following data about consumers:

-- Consumer ID (1:1 with oarc_consumer_key)
oarc_id integer unsigned NOT NULL PRIMARY KEY auto_increment,
-- OAuth consumer key and secret (or RSA key)
oarc_consumer_key varbinary(32) NOT NULL,
-- Name of the application
oarc_name varchar(128) binary NOT NULL,
-- Key to the user who proposed the application
oarc_user_id integer unsigned NOT NULL,
-- Version of the application
oarc_version varbinary(32) NOT NULL,
-- Callback URL
oarc_callback_url blob NOT NULL,
-- Application description
oarc_description blob NOT NULL,
-- Contact email address
oarc_email varchar(255) binary NOT NULL,
-- Confirmation of contact email address
oarc_email_authenticated varbinary(14) NULL,
-- What wiki this is allowed on (a single wiki or '*' for all)
oarc_wiki varbinary(32) NOT NULL,
-- Grants needed for client consumers
oarc_grants blob NOT NULL,
-- Timestamp of consumer proposal
oarc_registration varbinary(14) NOT NULL,

-- Mutable fields below:
oarc_secret_key varbinary(32) NULL,
oarc_rsa_key blob NULL,
-- JSON blob of allowed IP ranges
oarc_restrictions blob NOT NULL,
-- Stage in registration pipeline:
-- (0=proposed, 1=approved, 2=rejected, 3=expired, 4=disabled)
oarc_stage tinyint unsigned NOT NULL DEFAULT 0,
-- Timestamp of the last stage change
oarc_stage_timestamp varbinary(14) NOT NULL,
-- Whether this consumer is suppressed (hidden)
oarc_deleted tinyint unsigned NOT NULL DEFAULT 0

Some additional data that would be nice to store: rich text (HTML or wikitecxt) description and/or link to such a description; a privacy policy (or a link to it, but in-place is probably better since policy changes should not be possible without the approval of OAuth admins); a link to the author; a link to the application (for certain kinds of applications, which provide their own UI via a separate webpage); links to the source code, developer documentation and bug tracker; and icons and screenshots.

Of the existing fields, oarc_id, oarc_name and oarc_deleted reimplement existing ContentHandler features (title, page or version id, revdel). oarc_secret_key is secret and probably not appropriate for storing in the text table and should be moved to an external source if the private/public service split gets implemented. There should be (although currently there isn't - see T59631) a review workflow, probably implemented via oarc_version and oarc_stage, where one can upload a new version of the app for review while keeping the existing one functional, and the settings are updated as soon as the new version gets accepted. Most of the fields are involved in the authorization process, including some descriptive fields (as the user needs to see a description of what they are about to authorize).


This presents a number of challanges for using ContentHandler:

  • content needs to be available cross-wiki as the authorization dialog could be displayed on any site
  • some content (at least oarc_secret_key) should stay in the database
  • there would have to be some kind of pre-publication review workflow (something similar to how FlaggedRevs works)

On the plus side, ContentHandler would provide change tracking and notification (including diffs, watchlists, checkuser information), and a version history, while in a database-based implementation these would be missing or a lot of effort to reimplement.

On the whole, my first impression is that ContentHandler would not be an improvement here, but I'm not really familiar with it and might have overlooked the right way of using it.

Event Timeline

Tgr raised the priority of this task from to Needs Triage.
Tgr updated the task description. (Show Details)
Tgr added subscribers: Tgr, csteipp, Legoktm.

I like the advantages of using ContentHandler (in particular, the change tracking and transparency). Private content could be a logged-action against the "page", possibly? "Jdforrester updated the application secret key for Fish" and so on?

That would probably work. Alternatively, ContentHandler::unserializeContent() could look up the secret key in the database and add it to the content object, and it could be written back in the database on PageContentSave.

my first impression is that ContentHandler would not be an improvement here

Based on your description, I think the main improvements would be a) not duplicating core features like revdel b) and providing "change tracking and notification".

content needs to be available cross-wiki as the authorization dialog could be displayed on any site

You could create a custom API module like EventLogging's action=jsonschema.

there would have to be some kind of pre-publication review workflow (something similar to how FlaggedRevs works)

This would still be an issue in the database table version right?

Based on your description, I think the main improvements would be a) not duplicating core features like revdel b) and providing "change tracking and notification".

Agreed, I think those are big advantages. I am just not sure that the disadvantages are not bigger.

content needs to be available cross-wiki as the authorization dialog could be displayed on any site

You could create a custom API module like EventLogging's action=jsonschema.

You mean that Special:OAuth would make a server-side call to the API on meta? How would that work exactly, given that this data is private, security-sensitive and needs to be writable cross-wiki (to store grants)? I would rather not have an API endpoint that's exposed to anyone - it could make it possible, for example, to escalate a local XSS vulnerability to an all-cluster, almost invisible, permanent backdoor to that user.

there would have to be some kind of pre-publication review workflow (something similar to how FlaggedRevs works)

This would still be an issue in the database table version right?

Yes, but I imagine it would be a lot easier to do that with custom tables. I am a bit afraid of ending up with something unmaintainable like FlagRev. Maybe only dealing with a content type we own and/or not having to deal with template/image inclusion would make that kind of versioning a lot simpler - I am not familiar with it enough to be able to tell.

I think the gains from using ContentHandler (and by "using", it seems like this task assumes we're replacing the entire oarc table with a json-like ContentHandler page) are outweighed by the requirements that don't fit the ContentHandler model. Mostly the approval workflow, but also the cross wiki use.

You'd also pretty much be rewriting the entire extension. It's small enough that I think that's feasible, but I'd rather see us tackle OAuth2 if we're going to rewrite the whole thing.

I think using a json wiki page for supplemental information (description, link to privacy policy) and storing in the oarc table the approved revision id could work. You get into weird issues like what happens when the approved version get's suppressed, but I think you could make it workable.

I think the gains from using ContentHandler (and by "using", it seems like this task assumes we're replacing the entire oarc table with a json-like ContentHandler page) are outweighed by the requirements that don't fit the ContentHandler model. Mostly the approval workflow, but also the cross wiki use.

At a minimum the secret key would stay in the table - putting that in revision entries seems very problematic.

Or maybe we could leave everything in the table, and use that data to generate some kind of "wikitext" for diffing, so that a JSON representation of each version would be stored in the text table but it wouldn't be actually used for anything other than displaying old versions and diffs. That would still mean reimplementing all the actions but it would retain a workflow that's familiar for most users. I haven't looked at ContentHandler close enough to tell if that's a realistic idea or not.

You'd also pretty much be rewriting the entire extension. It's small enough that I think that's feasible, but I'd rather see us tackle OAuth2 if we're going to rewrite the whole thing.

Well, I might be able to tackle this but probably not OAuth2 so those aren't really alternatives :) Anyway, could you file a task about that?

Probably this can be done step by step in reasonably-sized chunks.
Step 1: preserve a namespace (Application?), create a page for every application name (not every consumer, since we don't want separate pages for each app version), use the talk page to aid communication between the developer and the admins, on the page show the application name and links to the Special:OAuthListConsumers entries of the various app versions or something similarly stupid. Applications cannot be renamed currently so no headache about that (but need to make sure that suppression of the app deletes the page).
Step 2: fix T57705: Hyperlinks seemingly unsupported in app metadata fields in OAuth MediaWiki extension and T60193: OAuth consumer metadata is too limited, incorporate that data into the page.
Setp 3: Find some way to make parts of the page user-editable (that probably requires a new table, or maybe magic subpages)

All this still depends on

Or maybe we could leave everything in the table, and use that data to generate some kind of "wikitext" for diffing, so that a JSON representation of each version would be stored in the text table but it wouldn't be actually used for anything other than displaying old versions and diffs. That would still mean reimplementing all the actions but it would retain a workflow that's familiar for most users. I haven't looked at ContentHandler close enough to tell if that's a realistic idea or not.