Page MenuHomePhabricator

Introduce ActingUser to represent the user performing the current request
Open, Needs TriagePublic

Description

Problem:
Some key functionality attached to user relies on global state to gain access to the current user's session as well as HTTP request information such as the client's IP address and cookies. This is needed in particular for abuse prevention via auto-blocking (based on IP and cookies) as well as rate limit (per user, per session, or per IP). We need a way to provide access to this information without having to rely on global state. However, refactoring all relevant code to explicitly pass down a WebRequest object seems undesirable, if at all feasible.

Examples of this can be found in PermissionManager and RequestManager, or by looking for usages of User::getRequest or User::pingLimiter. Callers of RequestContent::getMain() are also problematic, though that method is often called for other reasons as well.

Proposal:
Introduce ActingUser as an interface that extends UserIdentity. Critical checks, such as calls to PermissionManager::userCan with RIGOR_SECURE, must fail if presented with a UserIdentity that is not an ActingUser, or should require an ActingUser in the signature. The ActingUser represents not a user as such, but a user acting withing a certain context (a device, a session, oath token, address, etc). This context represents the execution context of the entire application (thanks th PHP's per-request execution model). For maintenance scripts and async jobs, the execution context would essentially just be "CLI" or "JOB".

Code that needs to check for context-based blocks or limits (session,IP address, cookies, device) will need access to some aspects of the http request. ActingUser could provide methods to access this information, but it's unclear how exactly these methods should be have in a non-web execution context. The relevant calling code will probably have to know about web-mode and cli-mode ActingUsers in any case.

Code that makes direct use of ActingUsers should be rare. However, it's likely to be used via many code paths for many use cases.

NOTE: this supersedes T218555

Event Timeline

daniel created this task.Sep 3 2019, 6:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2019, 6:52 PM
Daimona added a subscriber: Daimona.Sep 3 2019, 6:55 PM
kostajh added a subscriber: kostajh.Sep 3 2019, 6:56 PM
dmaza added a subscriber: dmaza.Sep 3 2019, 7:22 PM
Tgr added a comment.Sep 3 2019, 7:33 PM

Sounds more like you want User[Identity?]::isActingUser() rather than a new interface providing access to a bunch information we know we'll will only be able to access via some huge hack?

Anomie added a comment.Sep 3 2019, 8:01 PM

ActingUser sort of works in the context of UserIdentity, but you'd need User::newFromSession() to return some subclass of User that implements this new interface for the semantics to be right for the User objects that most of the code still actually uses.

It seems likely enough that ActingUser would actually need just one method, either ->getRequest() or ->getSession().[1][2] In that case, perhaps "RequestUserIdentity" or "SessionUserIdentity" would be a better name.

[1]: Or both, even though WebRequest and Session can each return the other.
[2]: Even if you instead try to guess every individual piece of request and session data some caller might want in order to provide a specific getter.

daniel added a comment.Sep 3 2019, 8:04 PM

ActingUser sort of works in the context of UserIdentity, but you'd need User::newFromSession() to return some subclass of User that implements this new interface for the semantics to be right for the User objects that most of the code still actually uses.

I suppose we could have ActingUser extends User, and ActingUserIdentity extending UserIdentity, but implementing the same UserWithExtecutionContext interface.

It seems likely enough that ActingUser would actually need just one method, either ->getRequest() or ->getSession().[1][2] In that case, perhaps "RequestUserIdentity" or "SessionUserIdentity" would be a better name.

One reason not to directly return the WebRequest is that that also provides access to the Response. And we'd kind of want to avoid that.

Also, @Krinkle would really like such methods to be completely unavailable when running in the context of a Job. But we'd still need an ActingUser to exist...

Anomie added a comment.Sep 3 2019, 8:14 PM

Also, @Krinkle would really like such methods to be completely unavailable when running in the context of a Job. But we'd still need an ActingUser to exist...

In that case you probably shouldn't give the job an ActingUser at all. Why would you need one to exist?

Tgr added a comment.EditedSep 3 2019, 9:57 PM

Jobs (or rather hooks invoked by jobs) need the IP address and user agent of the acting user, at the least. Also they need to apply session-based rights restrictions.

Also, @Krinkle would really like such methods to be completely unavailable when running in the context of a Job. But we'd still need an ActingUser to exist...

In that case you probably shouldn't give the job an ActingUser at all. Why would you need one to exist?

I think all code that does "secure" permission checks should require that an ActingUser is given. Otherwise, checks based on the execution context may be bypassed inadvertently if a plain UserIdentity is provided instead of the ActingUser.

We could also completely disable permission checks in jobs (and maintenance scripts). That would also fix it, but I'm not sure that would be safe.

Sounds more like you want User[Identity?]::isActingUser() rather than a new interface providing access to a bunch information we know we'll will only be able to access via some huge hack?

With the new interface, we would no longer need ugly hacks to gain access to ther's user's IP, session, and cookies. If all we have is a flag, we will have to continue to reply on global state.

Jobs (or rather hooks invoked by jobs) need the IP address and user agent of the acting user, at the least. Also they need to apply session-based rights restrictions.

They do? I mean, it's not impossible, we can just put the relevant information about the user's request / execution context into the Job serialization. But my gut feeling is that permission checks should happen before jobs get posted. Maybe it would even be ok to stub out permission checks while running jobs.

Do you have a use case where a job needs to enforce permissions that can't be checked before scheduling the job?

Anomie added a comment.Sep 4 2019, 1:56 PM

One reason not to directly return the WebRequest is that that also provides access to the Response. And we'd kind of want to avoid that.

How do you intend to support use cases like setting the block cookie then?

Agabi10 added a subscriber: Agabi10.Sep 4 2019, 2:12 PM
Tgr added a comment.Sep 4 2019, 10:40 PM

Do you have a use case where a job needs to enforce permissions that can't be checked before scheduling the job?

IP and useragent are mainly needed for checkuser, which is triggered by low-level hooks, which can be triggered by jobs.
That aside, there are plenty of such use cases:

  • per-page permission checks might not be feasible if the job changes a lot of pages, or generates the list of targets in a way that is hard to predict or might change over time.
  • more generally, the time between scheduling and running the job might be nontrivial, and permissions might change during that (e.g. because the user just got banned, or because the job itself is affecting pernissions, as is the case with throttles).
  • jobs can be cross-wiki.
  • there might be permission checks that cannot be predicted by the code scheduling the job, such as checks in hooks or low level checks like revdelete checks in revision logic. (Permission checks permeate MediaWiki business logic; they happen everywhere from high-level controller-like code such as API modules to mid-level helper classes that can be called from jobs like PageMove to low-level logic like content access. To some extent this is due to the codebase having been evolved organically, but often it's just common-sense defensive design, and IMO it'd be reckless to try removing such layered defense from a large and complex codebase.)

we can just put the relevant information about the user's request / execution context into the Job serialization.

We already do that (in a somewhat hacky way), see RequestContext::exportSession() / RequestContext::importScopedSession().

I think all code that does "secure" permission checks should require that an ActingUser is given.

One implication there is that system users would have to be ActingUser since they do perform actions which trigger secure permission checks, even though they don't have associated IPs or requests. So ActingUser (with the interface envisioned in the task description) doesn't seem like a natural abstraction.

T196575: Add block cookie for browser-based API edits (including VisualEditor & MobileFrontend) seems to indicate that we need to be able to manipulate the response (cookies) from the context of block management as well.

The BlockManager does currently manipulate the response (add or remove a block cookie), which it gets via a call to User::getRequest.

After T196575, the block cookie should be added and removed when it is safe to do so, by passing the response object to the BlockManager from the MediaWiki object (https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/534933/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/537099/).

I think all code that does "secure" permission checks should require that an ActingUser is given.

One implication there is that system users would have to be ActingUser since they do perform actions which trigger secure permission checks, even though they don't have associated IPs or requests. So ActingUser (with the interface envisioned in the task description) doesn't seem like a natural abstraction.

It seems to me that when system users perform actions, they are in fact acting users. But I agree that the connection between an acting user and a HTTP request and session is unclear. It seems to me that an ActingUser may have a request and/or session associated. We can make this optional, but since the request and context are typically use to impose restrictions, not grant additional rights, we have to make sure that they don't go missing accidentally.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply. This would be the case for system users, and also when running maintenance scripts on behalf of a specific user. For job execution, I don't think there would be any harm in not applying request based restrictions during job execution - after all, the "real" action already went through, jobs are needed to achieve eventual consistency of the system state. But @Tgr is probably right that we shouldn't skip permission checks entirely.

Side note: I'd be interested to know about any actions performed by jobs that require a secure permission checks. If that check is not redundant to a check done earlier, before the job was scheduled, that would point to a logical problem. It would mean that a user was allowed to perform an action but not to perform a required follow-up to that action.

Tgr added a comment.Sep 17 2019, 9:08 AM

For job execution, I don't think there would be any harm in not applying request based restrictions during job execution - after all, the "real" action already went through, jobs are needed to achieve eventual consistency of the system state.

That would mean that actions done by third-party applications authorized via OAuth or similar means act with the full set of permissions of the user, which seems pretty bad from a security POV.
In any case, as I said, request data will be needed for checkusert-type use cases, so it will have to be preserved and passed to the job anyway.

Side note: I'd be interested to know about any actions performed by jobs that require a secure permission checks.

Using MassMessage to deploy malicious Javascript. Using the page move with subpages feature to hijack a Javascript subpage. Using Nuke to delete superprotected pages. (Actually not because Nuke does not delegate page deletions to a job, but it would be a reasonable change to make it do that.) Using GWT to overwrite a protected image. etc.
As I said in T231930#5466782 there are plenty of reasons it might impractical or impossible to do all permission checks before scheduling jobs.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply.

Then what exactly is the difference between your "ActingUser2" and a normal UserIdentity? Just some semantics with no actual code difference?

TK-999 added a comment.EditedSep 17 2019, 4:23 PM

I was thinking that the interface we are looking for here may be somewhat similar to the concept of authorization Subjects as found in e.g. the Apache Shiro framework—something that provides access to the identity of the principal being authorized and the context of the authorization (e.g. the current request/session). A hypothetical narrow definition of such an interface could look as follows:

<?php

interface Authorizable {
    public function getIdentity(): UserIdentity;

    public function getContext(): AuthorizationContext;
}

The hypothetical AuthorizationContext interface could abstract away obtaining values from and setting values on the current request/session and thus avoid a hard dependency on WebRequest/ WebResponse at call sites. This would also allow non-web execution contexts such as jobs to supply their own implementation of AuthorizationContext as appropriate in those environments. Similarly, a permission check against an user that is not the current "acting" user could be performed on an Authorizable with a stub or null AuthorizationContext.

As an additional benefit, this would not make UserIdentity or its subclasses depend on WebRequest/ WebResponse or AuthorizationContext, unlike the current situation where User depends on the request.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply.

Then what exactly is the difference between your "ActingUser2" and a normal UserIdentity? Just some semantics with no actual code difference?

Yes. Doing a secure permission check on a plain UserIdentity should fail because that UserIdentity might be accidental, and in fact should be the ActingUser1, so additional restrictions should apply. ActingUser2 would basically say "no extra checks needed", ensuring this was deliberate.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply.

Then what exactly is the difference between your "ActingUser2" and a normal UserIdentity? Just some semantics with no actual code difference?

Yes. Doing a secure permission check on a plain UserIdentity should fail because that UserIdentity might be accidental, and in fact should be the ActingUser1, so additional restrictions should apply. ActingUser2 would basically say "no extra checks needed", ensuring this was deliberate.

I think we're getting to the point where we're winding up with a very complex system to try to solve too many incompatible requirements. You proposed adding "ActingUser" to represent basically the UserIdentity + a handle to the request/session, but then you want to avoid having a handle to the request/session sometimes so now you're making a second "ActingUser" that's the UserIdentity without the request/session, and justifying that existing because you now want typehints to try to prevent someone passing a UserIdentity to certain functions.

Tgr added a comment.Sep 17 2019, 7:58 PM

We'll need request data for system users anyway since something has to go into the checkuser log, so it might be easier to split the context, as @TK-999 suggests. A normal user is a UserIdentity with some limited proxy for request/response, a system user is a UserIdentity with a fake proxy that returns made-up data (such as a user agent explaining this is a maintenance script). A normal user in a job is a UserIdentity with a half-fake proxy (we still need the request but can't output cookies).