Page MenuHomePhabricator

Introduce ActingUser to represent the user performing the current request
Open, Needs TriagePublic

Description

Problem:
Some key functionality attached to user relies on global state to gain access to the current user's session as well as HTTP request information such as the client's IP address and cookies. This is needed in particular for abuse prevention via auto-blocking (based on IP and cookies) as well as rate limit (per user, per session, or per IP). We need a way to provide access to this information without having to rely on global state. However, refactoring all relevant code to explicitly pass down a WebRequest object seems undesirable, if at all feasible.

Examples of this can be found in PermissionManager and RequestManager, or by looking for usages of User::getRequest or User::pingLimiter. Callers of RequestContent::getMain() are also problematic, though that method is often called for other reasons as well.

Proposal:
Introduce ActingUser as an interface that extends UserIdentity. Critical checks, such as calls to PermissionManager::userCan with RIGOR_SECURE, must fail if presented with a UserIdentity that is not an ActingUser, or should require an ActingUser in the signature. The ActingUser represents not a user as such, but a user acting withing a certain context (a device, a session, oath token, address, etc). This context represents the execution context of the entire application (thanks th PHP's per-request execution model). For maintenance scripts and async jobs, the execution context would essentially just be "CLI" or "JOB".

Code that needs to check for context-based blocks or limits (session, IP address, cookies, device) will need access to some aspects of the http request. ActingUser could provide methods to access this information, but it's unclear how exactly these methods should behave in a non-web execution context. The relevant calling code will probably have to know about web-mode and cli-mode ActingUsers in any case.

Code that makes direct use of ActingUsers should be rare. However, it's likely to be used via many code paths for many use cases.

NOTE: this supersedes T218555

Event Timeline

daniel created this task.Sep 3 2019, 6:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 3 2019, 6:52 PM
Daimona added a subscriber: Daimona.Sep 3 2019, 6:55 PM
kostajh added a subscriber: kostajh.Sep 3 2019, 6:56 PM
dmaza added a subscriber: dmaza.Sep 3 2019, 7:22 PM
Tgr added a comment.Sep 3 2019, 7:33 PM

Sounds more like you want User[Identity?]::isActingUser() rather than a new interface providing access to a bunch information we know we'll will only be able to access via some huge hack?

Anomie added a comment.Sep 3 2019, 8:01 PM

ActingUser sort of works in the context of UserIdentity, but you'd need User::newFromSession() to return some subclass of User that implements this new interface for the semantics to be right for the User objects that most of the code still actually uses.

It seems likely enough that ActingUser would actually need just one method, either ->getRequest() or ->getSession().[1][2] In that case, perhaps "RequestUserIdentity" or "SessionUserIdentity" would be a better name.

[1]: Or both, even though WebRequest and Session can each return the other.
[2]: Even if you instead try to guess every individual piece of request and session data some caller might want in order to provide a specific getter.

daniel added a comment.Sep 3 2019, 8:04 PM

ActingUser sort of works in the context of UserIdentity, but you'd need User::newFromSession() to return some subclass of User that implements this new interface for the semantics to be right for the User objects that most of the code still actually uses.

I suppose we could have ActingUser extends User, and ActingUserIdentity extending UserIdentity, but implementing the same UserWithExtecutionContext interface.

It seems likely enough that ActingUser would actually need just one method, either ->getRequest() or ->getSession().[1][2] In that case, perhaps "RequestUserIdentity" or "SessionUserIdentity" would be a better name.

One reason not to directly return the WebRequest is that that also provides access to the Response. And we'd kind of want to avoid that.

Also, @Krinkle would really like such methods to be completely unavailable when running in the context of a Job. But we'd still need an ActingUser to exist...

Anomie added a comment.Sep 3 2019, 8:14 PM

Also, @Krinkle would really like such methods to be completely unavailable when running in the context of a Job. But we'd still need an ActingUser to exist...

In that case you probably shouldn't give the job an ActingUser at all. Why would you need one to exist?

Tgr added a comment.EditedSep 3 2019, 9:57 PM

Jobs (or rather hooks invoked by jobs) need the IP address and user agent of the acting user, at the least. Also they need to apply session-based rights restrictions.

Also, @Krinkle would really like such methods to be completely unavailable when running in the context of a Job. But we'd still need an ActingUser to exist...

In that case you probably shouldn't give the job an ActingUser at all. Why would you need one to exist?

I think all code that does "secure" permission checks should require that an ActingUser is given. Otherwise, checks based on the execution context may be bypassed inadvertently if a plain UserIdentity is provided instead of the ActingUser.

We could also completely disable permission checks in jobs (and maintenance scripts). That would also fix it, but I'm not sure that would be safe.

Sounds more like you want User[Identity?]::isActingUser() rather than a new interface providing access to a bunch information we know we'll will only be able to access via some huge hack?

With the new interface, we would no longer need ugly hacks to gain access to ther's user's IP, session, and cookies. If all we have is a flag, we will have to continue to reply on global state.

Jobs (or rather hooks invoked by jobs) need the IP address and user agent of the acting user, at the least. Also they need to apply session-based rights restrictions.

They do? I mean, it's not impossible, we can just put the relevant information about the user's request / execution context into the Job serialization. But my gut feeling is that permission checks should happen before jobs get posted. Maybe it would even be ok to stub out permission checks while running jobs.

Do you have a use case where a job needs to enforce permissions that can't be checked before scheduling the job?

Anomie added a comment.Sep 4 2019, 1:56 PM

One reason not to directly return the WebRequest is that that also provides access to the Response. And we'd kind of want to avoid that.

How do you intend to support use cases like setting the block cookie then?

Agabi10 added a subscriber: Agabi10.Sep 4 2019, 2:12 PM
Tgr added a comment.Sep 4 2019, 10:40 PM

Do you have a use case where a job needs to enforce permissions that can't be checked before scheduling the job?

IP and useragent are mainly needed for checkuser, which is triggered by low-level hooks, which can be triggered by jobs.
That aside, there are plenty of such use cases:

  • per-page permission checks might not be feasible if the job changes a lot of pages, or generates the list of targets in a way that is hard to predict or might change over time.
  • more generally, the time between scheduling and running the job might be nontrivial, and permissions might change during that (e.g. because the user just got banned, or because the job itself is affecting pernissions, as is the case with throttles).
  • jobs can be cross-wiki.
  • there might be permission checks that cannot be predicted by the code scheduling the job, such as checks in hooks or low level checks like revdelete checks in revision logic. (Permission checks permeate MediaWiki business logic; they happen everywhere from high-level controller-like code such as API modules to mid-level helper classes that can be called from jobs like PageMove to low-level logic like content access. To some extent this is due to the codebase having been evolved organically, but often it's just common-sense defensive design, and IMO it'd be reckless to try removing such layered defense from a large and complex codebase.)

we can just put the relevant information about the user's request / execution context into the Job serialization.

We already do that (in a somewhat hacky way), see RequestContext::exportSession() / RequestContext::importScopedSession().

I think all code that does "secure" permission checks should require that an ActingUser is given.

One implication there is that system users would have to be ActingUser since they do perform actions which trigger secure permission checks, even though they don't have associated IPs or requests. So ActingUser (with the interface envisioned in the task description) doesn't seem like a natural abstraction.

T196575: Add block cookie for browser-based API edits (including VisualEditor & MobileFrontend) seems to indicate that we need to be able to manipulate the response (cookies) from the context of block management as well.

The BlockManager does currently manipulate the response (add or remove a block cookie), which it gets via a call to User::getRequest.

After T196575, the block cookie should be added and removed when it is safe to do so, by passing the response object to the BlockManager from the MediaWiki object (https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/534933/ and https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/537099/).

I think all code that does "secure" permission checks should require that an ActingUser is given.

One implication there is that system users would have to be ActingUser since they do perform actions which trigger secure permission checks, even though they don't have associated IPs or requests. So ActingUser (with the interface envisioned in the task description) doesn't seem like a natural abstraction.

It seems to me that when system users perform actions, they are in fact acting users. But I agree that the connection between an acting user and a HTTP request and session is unclear. It seems to me that an ActingUser may have a request and/or session associated. We can make this optional, but since the request and context are typically use to impose restrictions, not grant additional rights, we have to make sure that they don't go missing accidentally.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply. This would be the case for system users, and also when running maintenance scripts on behalf of a specific user. For job execution, I don't think there would be any harm in not applying request based restrictions during job execution - after all, the "real" action already went through, jobs are needed to achieve eventual consistency of the system state. But @Tgr is probably right that we shouldn't skip permission checks entirely.

Side note: I'd be interested to know about any actions performed by jobs that require a secure permission checks. If that check is not redundant to a check done earlier, before the job was scheduled, that would point to a logical problem. It would mean that a user was allowed to perform an action but not to perform a required follow-up to that action.

Tgr added a comment.Sep 17 2019, 9:08 AM

For job execution, I don't think there would be any harm in not applying request based restrictions during job execution - after all, the "real" action already went through, jobs are needed to achieve eventual consistency of the system state.

That would mean that actions done by third-party applications authorized via OAuth or similar means act with the full set of permissions of the user, which seems pretty bad from a security POV.
In any case, as I said, request data will be needed for checkusert-type use cases, so it will have to be preserved and passed to the job anyway.

Side note: I'd be interested to know about any actions performed by jobs that require a secure permission checks.

Using MassMessage to deploy malicious Javascript. Using the page move with subpages feature to hijack a Javascript subpage. Using Nuke to delete superprotected pages. (Actually not because Nuke does not delegate page deletions to a job, but it would be a reasonable change to make it do that.) Using GWT to overwrite a protected image. etc.
As I said in T231930#5466782 there are plenty of reasons it might impractical or impossible to do all permission checks before scheduling jobs.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply.

Then what exactly is the difference between your "ActingUser2" and a normal UserIdentity? Just some semantics with no actual code difference?

TK-999 added a comment.EditedSep 17 2019, 4:23 PM

I was thinking that the interface we are looking for here may be somewhat similar to the concept of authorization [[ https://shiro.apache.org/static/1.4.1/apidocs/org/apache/shiro/subject/Subject.html | Subjects ]] as found in e.g. the Apache Shiro framework—something that provides access to the identity of the principal being authorized and the context of the authorization (e.g. the current request/session). A hypothetical narrow definition of such an interface could look as follows:

<?php

interface Authorizable {
    public function getIdentity(): UserIdentity;

    public function getContext(): AuthorizationContext;
}

The hypothetical AuthorizationContext interface could abstract away obtaining values from and setting values on the current request/session and thus avoid a hard dependency on WebRequest/ WebResponse at call sites. This would also allow non-web execution contexts such as jobs to supply their own implementation of AuthorizationContext as appropriate in those environments. Similarly, a permission check against an user that is not the current "acting" user could be performed on an Authorizable with a stub or null AuthorizationContext.

As an additional benefit, this would not make UserIdentity or its subclasses depend on WebRequest/ WebResponse or AuthorizationContext, unlike the current situation where User depends on the request.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply.

Then what exactly is the difference between your "ActingUser2" and a normal UserIdentity? Just some semantics with no actual code difference?

Yes. Doing a secure permission check on a plain UserIdentity should fail because that UserIdentity might be accidental, and in fact should be the ActingUser1, so additional restrictions should apply. ActingUser2 would basically say "no extra checks needed", ensuring this was deliberate.

I think ActingUser comes in two flavors, which could be subclasses or distinguished by asking the object: 1) a user acting via a web request. Additional restrictions may applies. 2) a user user acting without a web request, so no additional restrictions apply.

Then what exactly is the difference between your "ActingUser2" and a normal UserIdentity? Just some semantics with no actual code difference?

Yes. Doing a secure permission check on a plain UserIdentity should fail because that UserIdentity might be accidental, and in fact should be the ActingUser1, so additional restrictions should apply. ActingUser2 would basically say "no extra checks needed", ensuring this was deliberate.

I think we're getting to the point where we're winding up with a very complex system to try to solve too many incompatible requirements. You proposed adding "ActingUser" to represent basically the UserIdentity + a handle to the request/session, but then you want to avoid having a handle to the request/session sometimes so now you're making a second "ActingUser" that's the UserIdentity without the request/session, and justifying that existing because you now want typehints to try to prevent someone passing a UserIdentity to certain functions.

Tgr added a comment.Sep 17 2019, 7:58 PM

We'll need request data for system users anyway since something has to go into the checkuser log, so it might be easier to split the context, as @TK-999 suggests. A normal user is a UserIdentity with some limited proxy for request/response, a system user is a UserIdentity with a fake proxy that returns made-up data (such as a user agent explaining this is a maintenance script). A normal user in a job is a UserIdentity with a half-fake proxy (we still need the request but can't output cookies).

DannyS712 updated the task description. (Show Details)Oct 27 2019, 1:57 AM

As far as I can tell, the issue is we need a way for IP address and cookies to be available for logging and security checks. We obviously are not going to pass these all the way down the call chain. This seems like exactly the use-case a service manager is intended for. Why not just make a service that contains the needed info? Generally we can just inject the info from the service, but if some code wants to perform certain actions that should be considered relevant to a different IP address/set of cookies, it can inject different info. If something is being performed without relevance to any specific IP address, like a maintenance script run via a cron job, it can use 127.0.0.1 as the IP address (which I imagine is the info PHP supplies anyway). And of course empty cookies.

It's possible I'm missing something, because I don't understand what this proposal seeks to do. Why does any of this need to be connected to UserIdentity? We might want IP address/session/cookies for things where we don't really care who the user is as such. E.g., an IP-based block isn't associated with any user and I don't see why code dealing with it needs to be aware who the user is.

For an example of what @Simetrical is saying... see: https://symfony.com/doc/current/service_container/request.html I'm not saying that we should necessarily do that or not, just wanted to express that the pattern is not unheard of.

daniel added a comment.Jan 3 2020, 1:06 PM

As far as I can tell, the issue is we need a way for IP address and cookies to be available for logging and security checks. We obviously are not going to pass these all the way down the call chain.

We actually already pass something down the call chain where we perform these checks: a User object, sometimes wrapped in a IContextSource kitchen sink.

Having a service that provides access to the request/session/user seems acceptable, but not preferable. Since we are already injecting an object, we can use this as a pathway to inject new/different information.

I have thought about this issue some more. I'll post a new proposal here in a minute.

daniel added a comment.Jan 3 2020, 1:23 PM

New proposal: instead of having PermissionManager act on an ActingUser value object that extends UserIdentity, pull the relevant state and logic into an Authority object that performs authorization and can be injected.

Rough draft of the Authority interface:

interface Authority {
    function authorizeAction( $action );
    function authorizeActionOn( $action, $page );

    function getAuthorizationErrors( $action, $rigor );
    function getAuthorizationErrorsOn( $action, $page, $rigor );
}

The getAuthorizationErrors methods check authorization, which may or may not include checks for IP blocks and cascading protection, depending on $rigor. They are idempotent. The authorizeAction methods throw an exception upon failure, and may modify state - in particular, they can update counters and enforce rate limits.

We would create multiple implementations of this interface, likely including one that fails all authorization as well as one that allows all authorization. But most importantly, there would be one that covers the logic currently implemented in PermissionManager::userCan and friends: it would be instantiated based on the current request, session, and user identity, and would perform checks based on the user group, IP blocks, rate limits, etc.

The corresponding methods in PermissionManager would be deprecated and delegate to the new class. PermissionManager would only retain logic about group based permissions, not about blocks or other session based checks.

The User class would implement the Authority interface as well, delegating to the session based Authority instance if the User object is the user for the session. getAuthorizationErrors could still function for a user that isn't the acting user, but authorizeAction would always fail for a user based Authority that doesn't correspond to the current request.

Jobs can use an implementation of Authority that is based on the user identity and checks blocks, but does not use session or IP based information.

Since User implements the new interface, the new functionality becomes available everywhere immediately. Type hints against user can be narrowed to Authority when and where possible, allowing for alternative light weight implementations of the interface to be used.

Anomie added a comment.EditedJan 3 2020, 4:23 PM

New proposal: instead of having PermissionManager act on an ActingUser value object that extends UserIdentity, pull the relevant state and logic into an Authority object that performs authorization and can be injected.

Would that satisfy the non-PermissionManager use cases alluded to in the description of the task?

The corresponding methods in PermissionManager would be deprecated and delegate to the new class. PermissionManager would only retain logic about group based permissions, not about blocks or other session based checks.

If most of the time we only need one or the other of Authority and PermissionManager, ok. If we commonly wind up needing both, though, that could be rather annoying to have to inject and use both. I don't have a good guess as to which might be the case.

daniel added a comment.Jan 3 2020, 8:11 PM

Would that satisfy the non-PermissionManager use cases alluded to in the description of the task?

The only other use case I can think of is context info for logging (errors, stats). For that, global state might be ok (especially if the logger implementation already has that baked in), or we can just inject that info as a plain string into the respective service when it is created.

If most of the time we only need one or the other of Authority and PermissionManager, ok. If we commonly wind up needing both, though, that could be rather annoying to have to inject and use both. I don't have a good guess as to which might be the case.

The Authority interface should be designed to cover all use cases that in some way control what the current user can or can't do. PermissionManager would only be needed by code that is currently calling things like isEveryoneAllows() or getAllPermissions() or getNamespaceRestrictionLevels(), which is relatively rare, and conceptually separate from check whether an action can be performed.

Also note that PermissionManager would generally be injected in the constructor, while Authority would be passed as a parameter, instead of a User object (or, for now, as an interface implemented by a User object).

What about the BlockManager needing the RequestContext in order to read/set the cookies?

Anomie added a comment.Jan 3 2020, 8:24 PM

How about T218674, without breaking that ApiQueryUserInfo wants that behavior?

daniel added a comment.EditedJan 3 2020, 8:55 PM

What about the BlockManager needing the RequestContext in order to read/set the cookies?

My idea was that the "full" implementation of Authority (SessionAuthority or some such) would use the BlockManager (not the other way around). Since the SessionAuthority instance would know the request, it could make it available to the BlockManager for manipulation. This is similar to managing limit counters in the session, and would happen only in calls to authorizeXYZ methods, not the getAuthroizationErrorsXYZ methods.

How about T218674, without breaking that ApiQueryUserInfo wants that behavior?

The fundamental problem there seems to be that User::getRequest() always falls back to $wgRequest, for any user. With a SessionAuthority (or some some such), User::getRequest() would no longer be needed, and PermissionManager::getUserPermissions() would no longer apply any filtering based on the session. This filtering would only by done in the SessionAuthority, when checking permissions.

It's not clear to me what the intended semantics of ApiQueryUserInfo is - filtering permissions for other users seems like a bug, filtering permissions for the current user may or may not be wanted, depending on the use case. If it is needed, we could add a getPermissions() method to the Authority interface. Perhaps ApiQueryUserInfo should provide a way to explicitly ask for the effective permissions for the current user session and request, as opposed to listing the permissions just based on group membership. Maybe that should even be handled by a separate module, it seems conceptually different.

Anomie added a comment.Jan 3 2020, 9:20 PM

ApiQueryUserInfo is the module for requesting information about the user making the request. It makes sense to me for it to reflect the rights actually available for the request.

daniel added a comment.Jan 3 2020, 9:32 PM

ApiQueryUserInfo is the module for requesting information about the user making the request. It makes sense to me for it to reflect the rights actually available for the request.

Ah right, ApiQueryUsers is the generic one. So ApiQueryUserInfo would expose the filtered version based on the Authority, while ApiQueryUsers would use PermissionManager. Filtering could be done by just checking every right the user might have, or by adding a method for getting the effective permissions to the Authority interface.

DannyS712 updated the task description. (Show Details)Jan 11 2020, 10:21 AM