Page MenuHomePhabricator

Handle read access for private wikis
Open, MediumPublic

Description

We'd like to continue supporting editing on private wikis like office wiki. To do this, we need to globally check for the 'read' right in the userinfo query result for all read accesses to the domain.

More concrete steps:

Details

Related Gerrit Patches:
operations/mediawiki-config : masterVRS: Use RESTBase on officewiki
operations/puppet : productionEnable RESTBase on office wiki

Event Timeline

GWicke raised the priority of this task from to Needs Triage.
GWicke updated the task description. (Show Details)
GWicke added a project: RESTBase.
GWicke added a subscriber: GWicke.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 29 2015, 9:57 PM
mobrovac moved this task from Backlog to Ready / next on the RESTBase board.Feb 13 2015, 7:01 PM
GWicke triaged this task as Medium priority.Mar 15 2015, 4:57 PM
GWicke updated the task description. (Show Details)Jun 29 2015, 7:46 PM
Pchelolo claimed this task.Jun 30 2015, 9:05 AM

A brief description of the implementation plan.

Two options are available:

  1. Define x-security-modules part of the spec which would follow the same pattern as x-modules. This would be parsed on startup and security modules would be loaded either from security_mods folder or from npm. The security module would be a ordinary js module with the following interface: mod.filter(securityConfigs, restbase, request). It would do it's magic to check access and either throw 403 or return the authenticated user object in arbitrary form. (alternatively the handler would be passed to the module so that it could decide whether to call it sequentially or in parallel with the handler and whether it should provide principal object to handler). The securityConfigs parameter is a list of module-specific configurations collected following the path. So, on any path entry the security spec entry could appear (see example) These entries are collected along the path and passed to the security module. For example, wikimedia_auth would join all the needed permissions and verify that the current logged user has all of them. Interesting question is how to chain security calls when more than one module is in the path, but that shouldn't be a common use case, so we might just prohibit it. Pros of this solution: security stuff is not a rest endpoint, we decrease recursion depth for path-based calls, alternative implementations added easily.
  2. Define an auth module the same way as any other module within mods folder. The security spec would look the same way, but 'wikimedia_auth' would be the name of the endpoint within security module's rest interface. In this case we have one more level of recursion, and we have to pass potentially long lists of security params. Also, restbase now decides to do sequential or parallel security check, and this architecture is less flexible for adding new security modules.

Security spec example:

/{api:v1}:
  x-subspec: *wp/content/1.0.0
    security:
      - wikimedia_auth // This is a name of a security module
        permissions: // The rest is a module-specific information, for example here we provide required permissions.
          - read
mobrovac added a subscriber: mobrovac.EditedJun 30 2015, 8:08 PM

I must be missing something here, so please enlighten me ;)

If a wiki is private (as in it needs some kind of special user rights for read), we need to forward the cookies to Parsoid, which, in turn, passes them on to MW. That tells me that the userinfo check is superfluous (and might also be info not available to RB), so frankly I'd go with simple cookie forwarding as the first step towards handling private wikis.

While I think that this development of security modules is a good step forward, I am not sure we can do much about it concretely without a wider compromise and effort towards service-based auth(n|z).

@mobrovac, we also need to check permissions for stored content.

@mobrovac, we also need to check permissions for stored content.

That seems like: if the content is in storage, fwd the cookie and check, otherwise just fwd the cokkie in the request. It seems to me that for the first step we just need to figure out to which request to attach the cookie ;)

Also a valid question: if the user does not have the right to see the content, that does not mean we should waste a request on that, but instead try to find a way of retrieving the content and store regardless of the user's rights.

if the content is in storage, fwd the cookie and check, otherwise just fwd the cokkie in the request

We also might want to be able to configure specific access restrictions in RESTBase which go beyond what's done in Parsoid (ex: rate limits, private testing in RB). To enforce those, we need to check on each request. In the longer term, the RB-internal check should be able to use signatures instead of API calls, which can make it quite a bit cheaper.

retrieving the content and store regardless of the user's rights

That's potentially dangerous. We don't want to give a service like RESTBase (or Parsoid) the equivalent of 'root' rights if we can avoid it.

@mobrovac Basically we need 2 pieces:

  1. Forbid reads when they are not allowed to see the content stored in restbase - this is the one which my comment was talking about.
  2. Forwarding a cookie to allow parsoid access the resouse (and may be others also) - is a separate one. I'm not quite sure how to handle this one yet (we might want to have a way to specify if the cookies/headers should be forwarded for a specific request) We may want to simply forward all cookies as a beginning. I assume this could be done by storing a 'parent request' in child restbase and copying cookies from there, but I'm not sure yet.

I didn't understand what do you mean:

If the user does not have the right to see the content, that does not mean we should waste a request on that, but instead try to find a way of retrieving the content and store regardless of the user's rights.

Correct me if I'm wrong, but in case the user is forbidden to perform a request, Parsoid/MW api would return us nothing anyway? With the security module we could just check that before making a real request. But we anyway wouldn't be able to get data from parsoid/MW if we lack required credentials?

IMHO, when talking about RESTBase, we should strive for efficiency and correctness, and, when it comes to private wikis, I'm not sure we're there yet (in neither regard).

We also might want to be able to configure specific access restrictions in RESTBase which go beyond what's done in Parsoid (ex: rate limits, private testing in RB).

Naturally, each component in the system may bring along its own set of requirements and / or restrictions. But, in my view, it is not RESTBase's job to do that. An auth(n|z) service should be put in place to answer the question can RESTBase do X on behalf of user Y? (that looks a lot awful like OAuth, I know, but it really is our use-case). Additional schemes such as rate limitation and / or content obfuscation are a next stage / building block. This is the main reason why I think that without an authoritative system in place we are just trying to get ahead of ourselves.

retrieving the content and store regardless of the user's rights

That's potentially dangerous. We don't want to give a service like RESTBase (or Parsoid) the equivalent of 'root' rights if we can avoid it.

Giving RESTBase (per se) access to sensitive data does not imply policy violation, as long as RESTBase is not permitted to hand it out to a third party (and here again we encounter the need for an authoritative entity, which, granted, we suppose RESTBase to trust blindly). SLAs and APIs are about client-server contracts - that both applies to client-RESTBase and RESTBase-auth relationships. It follows that, if RESTBase respects them, there is no visible difference to the end client. As a side note, that doesn't seem to be too different from the current state of affairs wrt, e.g. deleted revisions or pages - they are stored in the DB, and RESTBase knows they're there, but there is no apparent way of getting to them.

@mobrovac Basically we need 2 pieces:

  1. Forbid reads when they are not allowed to see the content stored in restbase - this is the one which my comment was talking about.
  2. Forwarding a cookie to allow parsoid access the resouse (and may be others also) - is a separate one. I'm not quite sure how to handle this one yet (we might want to have a way to specify if the cookies/headers should be forwarded for a specific request) We may want to simply forward all cookies as a beginning. I assume this could be done by storing a 'parent request' in child restbase and copying cookies from there, but I'm not sure yet.

Yes, and I'm arguing that if we want to do something now, we need to start with supplying (2) and then formulate (1) because it is a piece that connects to a bigger puzzle than users should be able to read content on private wikis (keeping in mind we do not, as of yet, offer editing/saving/deleting).

I didn't understand what do you mean:

If the user does not have the right to see the content, that does not mean we should waste a request on that, but instead try to find a way of retrieving the content and store regardless of the user's rights.

What I hint to here is that simply because a user does not have the right to view something, that does not imply RESTBase does not have the right to store it and offer it to clients that do have that right. I guess one could argue this is an efficiency vs increased security issue: if there are 100 clients, 99 of which cannot access the given content, should RESTBase make 100 requests (supposing it does not have it and that the client that can view it comes last) ? IMO, if RESTBase can access that info, it should retrieve it right away (on the first request) and keep it until a privileged client comes along.

Put differently, I don't think this is any different than the point you are arguing for: suppose an appropriate client comes along first. They are allowed to view a page, so it is retrieved and stored in Cassandra. Now, on consequent requests, user rights are checked, but the users are denied access.

Both of these scenarios amount to the same thing: there are clients which were able to obtain a page and those which weren't but the page is still stored in Cassandra. Given such an end-game, I prefer for RESTBase to fetch the content on a first request and deny subsequent access if need to.

Correct me if I'm wrong, but in case the user is forbidden to perform a request, Parsoid/MW api would return us nothing anyway? With the security module we could just check that before making a real request. But we anyway wouldn't be able to get data from parsoid/MW if we lack required credentials?

Yes, which is why I raised the question of separating "RESTBase requests page X" from "user Y wants to access page X".

GWicke added a comment.EditedJun 30 2015, 11:55 PM

@mobrovac, I described some of the longer-term options at https://www.mediawiki.org/wiki/Requests_for_comment/SOA_Authentication. One of the things we should work towards in the longer term is efficient auth{z,n}, ideally without API requests (and yes, that's basically OAuth2 / JWTs). This means that we can avoid calling Parsoid (or perform any other kind of expensive operation) unless the user has access.

In the meantime, we can do something lame like a PHP API request. The first use case is VisualEditor on private wikis, which

  • is low volume, and
  • requires storage and thus auth{z,n} for that storage.
GWicke added a subscriber: csteipp.EditedAug 19 2015, 5:30 PM

We just met with @csteipp and walked through the architecture of the initial solution. We'll have to be careful to double-check the assumptions we are making about uniform read access before enabling this for any wiki. Some wikis might have per-namespace read restrictions, and some like zerowiki have even more specialized read rights. We won't be able to support those wikis with the current solution. Wikis that are likely to be especially tricky are:

  • arbcom
  • checkuser
  • zerowiki (lua access stuff, graphoid)

Some wikis might have per-namespace read restrictions

What?

Wikis that are likely to be especially tricky are:

  • arbcom
  • checkuser

Why are these special among all of the other private wikis?

@Krenair, we are going to check the actual permission setup vs. our assumptions before enabling private wiki support, for each private wiki. The wikis that are called out here are simply the ones that came up in our conversation.

@Krenair, we are going to check the actual permission setup vs. our assumptions before enabling private wiki support, for each private wiki. The wikis that are called out here are simply the ones that came up in our conversation.

Okay. You didn't actually address either of my questions, because:

  • MediaWiki does not (and most likely will never) support per-namespace read restrictions.
  • Still don't know why arbcom_*wiki and checkuserwiki are going to be especially tricky compared to the other wikis.

@Krenair, it is possible to implement arbitrary permission models via hooks, and per-namespace restrictions are just one of the many possibilities. In a former life as a MediaWiki consultant, this was a fairly common customer requirement.

We haven't investigated the details of these wikis yet, so can't directly answer your questions. Take it as completely unsubstantiated speculation ;)

Private wiki support has been deployed in production. The next step is to set up the configuration for WMF private wikis in wmf-config. The prime testing candidate seems to be officewiki.

Change 292109 had a related patch set uploaded (by Ppchelko):
Enable RESTBase on office wiki

https://gerrit.wikimedia.org/r/292109

Change 292319 had a related patch set uploaded (by Mobrovac):
VRS: Use RESTBase on officewiki

https://gerrit.wikimedia.org/r/292319

Change 292109 merged by Alexandros Kosiaris:
Enable RESTBase on office wiki

https://gerrit.wikimedia.org/r/292109

Change 292319 abandoned by Mobrovac:
VRS: Use RESTBase on officewiki

Reason:
We need a way to have updates for private wikis, so abandoning this for the time being.

https://gerrit.wikimedia.org/r/292319

Change 292109 merged by Alexandros Kosiaris:
Enable RESTBase on office wiki
https://gerrit.wikimedia.org/r/292109

This was reverted in https://gerrit.wikimedia.org/r/292328 . Two issues:

  • the MW API seems to throw a redirect when trying to issue a request to it for officewiki
  • we have to figure out a way to get updates working for private wikis before we can continue this endeavour

Hi, any plans to do this? :)

Hi, any plans to do this? :)

Next fiscal year we plan or re-engineering RESTBase, at which point we will tackle this task as well.

Magol added a subscriber: Magol.Jun 16 2018, 10:32 AM
Nilleholger added a subscriber: Nilleholger.EditedTue, Dec 3, 10:36 PM

Any chance to use this feature? Next fiscal year may start in 2019.. :-) (At least many featues rely on RESTBase for private wikis like pdf-export)