Page MenuHomePhabricator

Deprecate and remove User::isSafeToLoad
Closed, DeclinedPublic

Description

It was introduced for various direct and indirect violations of T124367 (using the User object before the session is ready).

A few points:

  • What happens when code uses wgUser too early (and doesn't check isSafeToLoad)? Probably the session throws an exception or assumes user is logged-out.
  • What is code typically changes to when we find such an issue as part of T124367? Either fixed to be deferred so the user object is initialised. Or patched up with isSafeToLoad.
  • What does patching up with isSafeToLoad mean? In my opinion, not much. Usually assumes logged-out. Same as before but more explicit.

I can understand how this method may've seemed useful as a hot-patch while SessionManager settles down, but I strongly belief the existence of this method is confusing and flawed. With SessionManager we're in a much better position than before. We don't need it anymore.

As we discover more bad callers, the solution in most cases is to just defer the code so that it doesn't run too early. And if it absolutely has to run early, the caller needs to accept that the session isn't there yet and it should create an anonymous object instead of using wgUser. So I'd like us to make the object throw on access (or simply not yet exist) until the session is initialised. And we fix callers to be more explicit and deterministic.

In some cases that may mean we need to fix indirect callers, rather than the methods listed below, to comply with a new contract.

Current usage of $wgUser->isSafeToLoad():

Event Timeline

Krinkle renamed this task from Deprecate User::isSafeToLoad to Deprecate and remove User::isSafeToLoad.Feb 29 2016, 6:59 PM
Krinkle triaged this task as Low priority.

What happens when code uses wgUser too early (and doesn't check isSafeToLoad)?

In most cases, a warning is logged and $wgUser returns data as if for the logged-out user. If the caller caches this result, it might wind up behaving incorrectly later in the request.

If MW_NO_SESSION is in effect, an exception is thrown from User::loadFromSession().

What does patching up with isSafeToLoad mean? In my opinion, not much. Usually assumes logged-out. Same as before but more explicit.

It avoids the warning being logged, and being explicit as to what's going on in the calling code is much nicer than silently using a logged-out user when the (logged-in) request user is expected.

It also lets things avoid caching the data determined on the basis of the wrong user.

So I'd like us to make the object throw on access (or simply not yet exist) until the session is initialised.

Once we're reasonably sure nothing left is hitting it, I wouldn't oppose making User::load() throw instead of merely logging the warning. At the moment I believe we're still waiting on 1.27.0-wmf.15 to see if the first round of patches missed anything.

Making $wgUser not exist until it's safe would also need to change RequestContext::getUser() to return null under the same conditions, and maybe even User::newFromSession() in case anything decides to call that directly. And the same for $wgLang and RequestContext::getLanguage(), and potentially a few other things. I note you'd basically be trading the isSafeToLoad() call for !== null.

I note you'd basically be trading the isSafeToLoad() call for !== null.

Yes, for static access through wgUser, a check such as !== null would likely be needed. However, that should be deprecated.

For injected or contextual access we'd ideally find a way to naturally allow the object to only be created when the data required for, it is available.

The User::loadFromSession() method is already private, which is good. The main way to trigger it is through the creation of a dedicated User object via User::newFromSession which is never called by User.php itself, rather it's called by RequestContext.

This means by design any User object should be safe to use. Except, loadFromSession is lazily called, and we allow RequestContext to instantiate too early, letting the object get passed around and accessed by different code paths as if it were a valid User object. If the session data was injected to the User class, this wouldn't be possible as it would pose a paradox. I'm not suggesting we actually inject or load the session before needed (it'd be bad for performance). But in principle, the User class should behave as if it were injected, meaning it shouldn't exist before it can load.

I think we're saying the same thing with different words. The simplest way to achieve that I'm describing is basically to make RequestContext::getUser throw if called before $wgFullyInitialised is true. That's almost the same as making load() throw, except that it creates more consistent behaviour for developers, e.g. identifying early access as a mistake, instead of allowing it to instantiate and get passed down, exploding only if the wrong method is called at the wrong time.

I think we're saying the same thing with different words.

If we are, your words confuse me. :/ How do you somehow get the User class to not exist?

I see two ways we could go here:

  1. Allow creation of a User object for the session-user that won't be usable until later when the session data is available.
    1. If the User object is used before the session data is available, have it return anonymous-user data. This, plus a warning, is the current behavior.
    2. If the User object is used before the session data is available, throw an exception. If the existing warning isn't being logged anymore, we should probably change to this.
  2. Do not allow creation of a User object for the session-user until the session data is available. This is what you seem to be advocating.

For #2 in a DI world, that probably means you need to create a "SessionUserFactory" (probably as a new micro-interface to be implemented by IContextSource or RequestContext, since @daniel doesn't like IContextSource for DI) to inject when things that would normally need the session-user injected are created before the session data is available.

And to get from here to there, you'll probably have to go through a phase of logging a warning on creation (versus the current on-load warning) to find things needing fixing before the change.

I don't have any particular objection to #2, besides that it seems more work to get to and more complexity to make DI work.

The simplest way to achieve that I'm describing is basically to make RequestContext::getUser throw if called before $wgFullyInitialised is true. That's almost the same as making load() throw, except that it creates more consistent behaviour for developers,

You'd need to make User::newFromSession() throw too then. Or instead, since RequestContext::getUser() just calls that method.

And probably you'd still need to make User::load() or User::loadFromSession() also throw in case something screws around setting $user->mFrom = 'session' (ugh, public field) or calling $user->clearInstanceCache( 'session' ).

e.g. identifying early access as a mistake, instead of allowing it to instantiate and get passed down, exploding only if the wrong method is called at the wrong time.

I note creation of a lazy-loaded object isn't exactly "access", particularly with DI.

I think this is way too harsh for the current MediaWiki hook system (which makes it pretty much impossible to predict when certain callbacks are going to run). Getting an anonymous user is not a big deal most of the time; getting an exception is. (Although arguably an anonymous user exposing the IP is problematic and maybe should be replaced with some kinda system user created for this purpose.)

I'm revoking this proposal and will instead per-sue a similar outcome as part of general refactoring toward MediaWikiServices and deprecation of wgUser and RequestContext::getMain.

That is, in a situation where all web-related entry points create their RequestContext explicitly and pass it to an explicitly instantiated Action or SpecialPage, then we'll have naturally resolved this because the only way to get a User object would be from code that has one available.

The main problem right now is our "current RequestContext" isn't explicitly instantiated, rather it magically comes into existence somewhere half-way through the handling or an Action or SpecialPage page load. And nothing is stopping it from coming into existence earlier.

See also:

The main problem right now is our "current RequestContext" isn't explicitly instantiated, rather it magically comes into existence somewhere half-way through the handling or an Action or SpecialPage page load. And nothing is stopping it from coming into existence earlier.

IMO the main problem is that many hooks / services are usually called after the user has been identified, but can occasionally be called before that. Dependency injection won't magically solve that; either there will have to be a way to tell whether the session user is available, or we'll have to prevent or restrict pre-session-initialization hook calling and figure out what to do about the use cases they currently handle.