Handle read access for private wikis
Open, MediumPublic
Actions

Assigned To

None

Authored By

	• GWicke
	Jan 29 2015, 9:57 PM

Description

We'd like to continue supporting editing on private wikis like office wiki. To do this, we need to globally check for the 'read' right in the userinfo query result for all read accesses to the domain.

More concrete steps:

Figure out a way to configure permissions at the root of a wiki. It might make sense to use the swagger security object for this, possibly with a custom Mediawiki mediaWikiSecurity scheme as sketched in these older notes on schema loading and the production config.yaml.
- It might actually make sense to set this at the level of https://github.com/wikimedia/operations-puppet/blob/production/modules/restbase/templates/config.yaml.erb#L161, so that we can share the sub-specs between public & private wikis.
On each read request (GET), kick off a parallel access check in RESTBase (in parallel with the handler), ideally in a separate security module that hides the details. For POST and PUT, we'll probably want to perform this check sequentially before doing anything else. For now, the request will be to the userinfo query end point in the PHP API, and will forward the incoming cookie header to MediaWiki so that the request can be authenticated.
If the access check fails for 'read' rights, return a 403.

Details

	Subject	Repo	Branch	Lines +/-
	VRS: Use RESTBase on officewiki	operations/mediawiki-config	master	+1 -0
	Enable RESTBase on office wiki	operations/puppet	production	+7 -0

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	None	T89568 RESTBase 1.0 release
Open	None	T88016 Handle read access for private wikis
Declined	• Pchelolo	T137140 Support change propagation for private wikis

Event Timeline

• GWicke created this task.Jan 29 2015, 9:57 PM

• GWicke raised the priority of this task from to Needs Triage.

• GWicke updated the task description. (Show Details)

• GWicke added a project: RESTBase.

• GWicke subscribed.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 29 2015, 9:57 PM

• GWicke added a parent task: T1228: RESTbase deployment.Jan 29 2015, 10:00 PM

• GWicke mentioned this in T1228: RESTbase deployment.

• GWicke added a parent task: T89481: RESTBase beta release (revision storage / content API).Feb 13 2015, 6:57 PM

• GWicke removed a parent task: T1228: RESTbase deployment.Feb 13 2015, 7:00 PM

• mobrovac moved this task from Backlog to Ready / next on the RESTBase board.Feb 13 2015, 7:01 PM

• GWicke mentioned this in T89481: RESTBase beta release (revision storage / content API).Feb 14 2015, 10:33 PM

• mobrovac added a project: RESTBase-release-1.0.Feb 27 2015, 3:21 PM

• mobrovac set Security to None.

• mobrovac added a parent task: T89568: RESTBase 1.0 release.

• mobrovac removed a parent task: T89481: RESTBase beta release (revision storage / content API).

• GWicke triaged this task as Medium priority.Mar 15 2015, 4:57 PM

• GWicke updated the task description. (Show Details)Jun 29 2015, 7:46 PM

• Pchelolo claimed this task.Jun 30 2015, 9:05 AM

A brief description of the implementation plan.

Two options are available:

Define x-security-modules part of the spec which would follow the same pattern as x-modules. This would be parsed on startup and security modules would be loaded either from security_mods folder or from npm. The security module would be a ordinary js module with the following interface: mod.filter(securityConfigs, restbase, request). It would do it's magic to check access and either throw 403 or return the authenticated user object in arbitrary form. (alternatively the handler would be passed to the module so that it could decide whether to call it sequentially or in parallel with the handler and whether it should provide principal object to handler). The securityConfigs parameter is a list of module-specific configurations collected following the path. So, on any path entry the security spec entry could appear (see example) These entries are collected along the path and passed to the security module. For example, wikimedia_auth would join all the needed permissions and verify that the current logged user has all of them. Interesting question is how to chain security calls when more than one module is in the path, but that shouldn't be a common use case, so we might just prohibit it. Pros of this solution: security stuff is not a rest endpoint, we decrease recursion depth for path-based calls, alternative implementations added easily.
Define an auth module the same way as any other module within mods folder. The security spec would look the same way, but 'wikimedia_auth' would be the name of the endpoint within security module's rest interface. In this case we have one more level of recursion, and we have to pass potentially long lists of security params. Also, restbase now decides to do sequential or parallel security check, and this architecture is less flexible for adding new security modules.

Security spec example:

/{api:v1}:
  x-subspec: *wp/content/1.0.0
    security:
      - wikimedia_auth // This is a name of a security module
        permissions: // The rest is a module-specific information, for example here we provide required permissions.
          - read

I must be missing something here, so please enlighten me ;)

If a wiki is private (as in it needs some kind of special user rights for read), we need to forward the cookies to Parsoid, which, in turn, passes them on to MW. That tells me that the userinfo check is superfluous (and might also be info not available to RB), so frankly I'd go with simple cookie forwarding as the first step towards handling private wikis.

While I think that this development of security modules is a good step forward, I am not sure we can do much about it concretely without a wider compromise and effort towards service-based auth(n|z).

@mobrovac, we also need to check permissions for stored content.

In T88016#1415137, @GWicke wrote:

@mobrovac, we also need to check permissions for stored content.

That seems like: if the content is in storage, fwd the cookie and check, otherwise just fwd the cokkie in the request. It seems to me that for the first step we just need to figure out to which request to attach the cookie ;)

Also a valid question: if the user does not have the right to see the content, that does not mean we should waste a request on that, but instead try to find a way of retrieving the content and store regardless of the user's rights.

if the content is in storage, fwd the cookie and check, otherwise just fwd the cokkie in the request

We also might want to be able to configure specific access restrictions in RESTBase which go beyond what's done in Parsoid (ex: rate limits, private testing in RB). To enforce those, we need to check on each request. In the longer term, the RB-internal check should be able to use signatures instead of API calls, which can make it quite a bit cheaper.

retrieving the content and store regardless of the user's rights

That's potentially dangerous. We don't want to give a service like RESTBase (or Parsoid) the equivalent of 'root' rights if we can avoid it.

@mobrovac Basically we need 2 pieces:

Forbid reads when they are not allowed to see the content stored in restbase - this is the one which my comment was talking about.
Forwarding a cookie to allow parsoid access the resouse (and may be others also) - is a separate one. I'm not quite sure how to handle this one yet (we might want to have a way to specify if the cookies/headers should be forwarded for a specific request) We may want to simply forward all cookies as a beginning. I assume this could be done by storing a 'parent request' in child restbase and copying cookies from there, but I'm not sure yet.

I didn't understand what do you mean:

If the user does not have the right to see the content, that does not mean we should waste a request on that, but instead try to find a way of retrieving the content and store regardless of the user's rights.

Correct me if I'm wrong, but in case the user is forbidden to perform a request, Parsoid/MW api would return us nothing anyway? With the security module we could just check that before making a real request. But we anyway wouldn't be able to get data from parsoid/MW if we lack required credentials?

IMHO, when talking about RESTBase, we should strive for efficiency and correctness, and, when it comes to private wikis, I'm not sure we're there yet (in neither regard).

In T88016#1415156, @GWicke wrote:

We also might want to be able to configure specific access restrictions in RESTBase which go beyond what's done in Parsoid (ex: rate limits, private testing in RB).

Naturally, each component in the system may bring along its own set of requirements and / or restrictions. But, in my view, it is not RESTBase's job to do that. An auth(n|z) service should be put in place to answer the question can RESTBase do X on behalf of user Y? (that looks a lot awful like OAuth, I know, but it really is our use-case). Additional schemes such as rate limitation and / or content obfuscation are a next stage / building block. This is the main reason why I think that without an authoritative system in place we are just trying to get ahead of ourselves.

retrieving the content and store regardless of the user's rights

That's potentially dangerous. We don't want to give a service like RESTBase (or Parsoid) the equivalent of 'root' rights if we can avoid it.

Giving RESTBase (per se) access to sensitive data does not imply policy violation, as long as RESTBase is not permitted to hand it out to a third party (and here again we encounter the need for an authoritative entity, which, granted, we suppose RESTBase to trust blindly). SLAs and APIs are about client-server contracts - that both applies to client-RESTBase and RESTBase-auth relationships. It follows that, if RESTBase respects them, there is no visible difference to the end client. As a side note, that doesn't seem to be too different from the current state of affairs wrt, e.g. deleted revisions or pages - they are stored in the DB, and RESTBase knows they're there, but there is no apparent way of getting to them.

In T88016#1415185, @Pchelolo wrote:

@mobrovac Basically we need 2 pieces:

Forbid reads when they are not allowed to see the content stored in restbase - this is the one which my comment was talking about.

Forwarding a cookie to allow parsoid access the resouse (and may be others also) - is a separate one. I'm not quite sure how to handle this one yet (we might want to have a way to specify if the cookies/headers should be forwarded for a specific request) We may want to simply forward all cookies as a beginning. I assume this could be done by storing a 'parent request' in child restbase and copying cookies from there, but I'm not sure yet.

Yes, and I'm arguing that if we want to do something now, we need to start with supplying (2) and then formulate (1) because it is a piece that connects to a bigger puzzle than users should be able to read content on private wikis (keeping in mind we do not, as of yet, offer editing/saving/deleting).

I didn't understand what do you mean:

If the user does not have the right to see the content, that does not mean we should waste a request on that, but instead try to find a way of retrieving the content and store regardless of the user's rights.

What I hint to here is that simply because a user does not have the right to view something, that does not imply RESTBase does not have the right to store it and offer it to clients that do have that right. I guess one could argue this is an efficiency vs increased security issue: if there are 100 clients, 99 of which cannot access the given content, should RESTBase make 100 requests (supposing it does not have it and that the client that can view it comes last) ? IMO, if RESTBase can access that info, it should retrieve it right away (on the first request) and keep it until a privileged client comes along.

Put differently, I don't think this is any different than the point you are arguing for: suppose an appropriate client comes along first. They are allowed to view a page, so it is retrieved and stored in Cassandra. Now, on consequent requests, user rights are checked, but the users are denied access.

Both of these scenarios amount to the same thing: there are clients which were able to obtain a page and those which weren't but the page is still stored in Cassandra. Given such an end-game, I prefer for RESTBase to fetch the content on a first request and deny subsequent access if need to.

Correct me if I'm wrong, but in case the user is forbidden to perform a request, Parsoid/MW api would return us nothing anyway? With the security module we could just check that before making a real request. But we anyway wouldn't be able to get data from parsoid/MW if we lack required credentials?

Yes, which is why I raised the question of separating "RESTBase requests page X" from "user Y wants to access page X".

@mobrovac, I described some of the longer-term options at https://www.mediawiki.org/wiki/Requests_for_comment/SOA_Authentication. One of the things we should work towards in the longer term is efficient auth{z,n}, ideally without API requests (and yes, that's basically OAuth2 / JWTs). This means that we can avoid calling Parsoid (or perform any other kind of expensive operation) unless the user has access.

In the meantime, we can do something lame like a PHP API request. The first use case is VisualEditor on private wikis, which

is low volume, and
requires storage and thus auth{z,n} for that storage.

• GWicke mentioned this in T105975: RFC: Generalize content-addressable POST request storage.Jul 23 2015, 4:51 PM

We just met with @csteipp and walked through the architecture of the initial solution. We'll have to be careful to double-check the assumptions we are making about uniform read access before enabling this for any wiki. Some wikis might have per-namespace read restrictions, and some like zerowiki have even more specialized read rights. We won't be able to support those wikis with the current solution. Wikis that are likely to be especially tricky are:

arbcom
checkuser
zerowiki (lua access stuff, graphoid)

Krenair subscribed.Aug 19 2015, 6:09 PM

In T88016#1553918, @GWicke wrote:

Some wikis might have per-namespace read restrictions

What?

In T88016#1553918, @GWicke wrote:

Wikis that are likely to be especially tricky are:

arbcom

checkuser

Why are these special among all of the other private wikis?

@Krenair, we are going to check the actual permission setup vs. our assumptions before enabling private wiki support, for each private wiki. The wikis that are called out here are simply the ones that came up in our conversation.

In T88016#1555135, @GWicke wrote:

@Krenair, we are going to check the actual permission setup vs. our assumptions before enabling private wiki support, for each private wiki. The wikis that are called out here are simply the ones that came up in our conversation.

Okay. You didn't actually address either of my questions, because:

MediaWiki does not (and most likely will never) support per-namespace read restrictions.
Still don't know why arbcom_*wiki and checkuserwiki are going to be especially tricky compared to the other wikis.

@Krenair, it is possible to implement arbitrary permission models via hooks, and per-namespace restrictions are just one of the many possibilities. In a former life as a MediaWiki consultant, this was a fairly common customer requirement.

We haven't investigated the details of these wikis yet, so can't directly answer your questions. Take it as completely unsubstantiated speculation ;)

!cms

j/k

Krenair mentioned this in T110474: Decom parsoid-lb.eqiad.wikimedia.org entrypoint.Aug 27 2015, 3:02 AM

• Pchelolo mentioned this in T109702: Incorrect node sharing in router.Sep 1 2015, 5:49 AM

Diffusion mentioned this in rGRES513d4b4bd120: Added cookie forwarding (T88016).Sep 8 2015, 4:35 PM

• GWicke mentioned this in rGRES3ca0b4304507: Merge pull request #272 from Pchelolo/cookie_forward.Sep 8 2015, 4:35 PM

• mobrovac mentioned this in rGRBD7df297e39fb3: Update restbase to 10d7242.Sep 8 2015, 5:09 PM

Private wiki support has been deployed in production. The next step is to set up the configuration for WMF private wikis in wmf-config. The prime testing candidate seems to be officewiki.

• mobrovac mentioned this in T127941: Get rev id from the etag.Feb 29 2016, 11:46 AM

Change 292109 had a related patch set uploaded (by Ppchelko):
Enable RESTBase on office wiki

https://gerrit.wikimedia.org/r/292109

gerritbot added a project: Patch-For-Review.Jun 1 2016, 11:47 AM

• mobrovac added projects: RESTBase-API, User-mobrovac.Jun 1 2016, 12:18 PM

Change 292319 had a related patch set uploaded (by Mobrovac):
VRS: Use RESTBase on officewiki

https://gerrit.wikimedia.org/r/292319

Change 292109 merged by Alexandros Kosiaris:
Enable RESTBase on office wiki

https://gerrit.wikimedia.org/r/292109

akosiaris mentioned this in rOPUPb7769560f8f3: Enable RESTBase on office wiki.Jun 2 2016, 8:48 AM

Change 292319 abandoned by Mobrovac:
VRS: Use RESTBase on officewiki

Reason:
We need a way to have updates for private wikis, so abandoning this for the time being.

https://gerrit.wikimedia.org/r/292319

In T88016#2348371, @gerritbot wrote:

Change 292109 merged by Alexandros Kosiaris:
Enable RESTBase on office wiki

https://gerrit.wikimedia.org/r/292109

This was reverted in https://gerrit.wikimedia.org/r/292328 . Two issues:

the MW API seems to throw a redirect when trying to issue a request to it for officewiki
we have to figure out a way to get updates working for private wikis before we can continue this endeavour

• GWicke created subtask T137140: Support change propagation for private wikis.Jun 6 2016, 7:13 PM

• Pchelolo mentioned this in rOPUPc1f5318f49e9: Enable RESTBase on office wiki.Jun 17 2016, 6:07 PM

• Pchelolo mentioned this in T165767: HTTP 500 error on page when attempting to use Visual Editor.May 19 2017, 3:14 PM

• GWicke edited projects, added Services (later); removed Services.Jul 12 2017, 7:34 PM

• Pchelolo mentioned this in T181687: Give RESTBase / MCS requests the apihighlimits right.Dec 4 2017, 8:07 PM

EddieGP subscribed.Mar 22 2018, 2:40 PM

Hi, any plans to do this? :)

In T88016#4180537, @Paladox wrote:

Hi, any plans to do this? :)

Next fiscal year we plan or re-engineering RESTBase, at which point we will tackle this task as well.

• mobrovac mentioned this in T195254: RESTBase Error 'SiteInfo is unavailable'.May 22 2018, 9:14 AM

• mobrovac merged a task: T195254: RESTBase Error 'SiteInfo is unavailable'.

• mobrovac added a subscriber: Ralpha9.

Magol subscribed.Jun 16 2018, 10:32 AM

• mobrovac added a project: Platform Team Legacy (Later).Dec 20 2018, 12:07 PM

Jontheil subscribed.Aug 21 2019, 8:40 PM

ShahimEssaid subscribed.Oct 11 2019, 4:35 PM

daniel mentioned this in T235528: Check read permissions for from and to revision in comparison endpoint.Oct 15 2019, 4:09 PM

Any chance to use this feature? Next fiscal year may start in 2019.. :-) (At least many featues rely on RESTBase for private wikis like pdf-export)

@Pchelolo: Hi! This task has been assigned to you a while ago. Do you still plan to work on this task?
If you do not plan to work on this task anymore: Please consider removing yourself as assignee (via Add Action... → Assign / Claim in the dropdown menu): That would allow others to work on this (in theory), as others won't think that someone is already working on this. Thanks! :)