Page MenuHomePhabricator

Provide access to WebRequest and associated information via a service object
Closed, DeclinedPublic

Description

Currently, application logic accesses information about the current request (such as the identity of the current user, the requested page, etc) primarily via a RequestContextSource singleton obtained from RequestContext::getMain(). RequestContextSource in turn binds to "heavy" classes like Title and User, and RequestContext relies heavily on global state. It also doubles as an i18n facility, implementing MessageLocalizer.

This should be replaced by a RequestInfo service object, that can be injected (or obtained from the global service locator if need be). RequestInfo would be structurally similar to RequestContext, but would bind to lightweight interfaces instead of heavy classes, and would not rely on global state.

Note that ideally, any information derived from the request should be passed to application logic as a parameter, and should not be known to service instances. However, changing MediaWiki's architecture in that regard seems difficult, and PHP's per-request execution model makes it acceptable to treat the user request as a global thing, making it available via a service instance. The same could perhaps be said about the response, but treating the response as global seems more problematic, since it is mutable.

RequestInfo should implement the following methods (similar to the ones found in RequestContext):

  • getRequest(): WebRequest. Note however that WebRequest binds to Session, which binds to User. Session::getUser should in time be replaced with Session::getUserIdentity to fix this.
  • getTarget(): LinkTarget. Replaces getTitle().
  • getUserIdentity(): the identity of the logged in user, if any
  • getRequestedLanguage(): get the code of the requested output language. Based on the user's preferences, but may be overwritten per request. Avoid binding against the heavy Language class, though.
  • getRequestedSkin(): get the name of the requested skin. Based on the user's preferences, but may be overwritten per request. Avoid binding against the heavy Skin class, though.
  • getDiagnostics(): returns a human readable array of diagnostic information about the request, similar to RequestContent::exportSession(). The contents should be treated as serializable, opaque, and for human consumption.

RequestInfo should not implement the following methods found in RequestContext:

  • getOutput() - while input can be treated as a global thing, output generally should not.
  • getStats(), getTiming() - use separate services or aggregate in some kind of output object
  • getWikiPage() and canUseWikiPage - WikiPage (or, in the future, PageRecord, see T195069) should be obtained from a storage service.
  • getConfig() - this has nothing at all to do with the request
  • msg() - localization should be done elsewhere.

RequestInfo could implement the following additional methods in the future:

  • getSession(): Same as getRequest()->getSession(). WebRequest::getSession could perhaps be deprecated in the future, to turn WebRequest into a value object.
  • getPageIdentity(): PageIdentity, once PageIdentity exists (T208776). Very similar to getTarget(), perhaps not needed.
  • isLoggedIn(): whether the request comes from a logged in user.
  • isWebRequest(): whether this is actually a web request, as opposed to a CLI script or async job execution.

Event Timeline

daniel created this task.Mar 18 2019, 11:10 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 18 2019, 11:10 AM
Nikerabbit updated the task description. (Show Details)Mar 18 2019, 4:03 PM

So it's very easy to write the class, but I can't figure out where to use it. Most RequestContext users seem to use it for config and messages and such, inter alia, so there aren't many places I can see where it makes a lot of sense to just substitute RequestInfo. It seems like you'd want to start from classes like LogFormatter and Message, and the first step there should probably be to make them be constructed by services.

The main motivation right now is to have a clean way to inject access to the WebRequest into PermissionManager and friends.

There is no use of WebRequest in PermissionManager right now. The only use of RequestContext is to pass it straight along to Action, which exposes it to subclasses that could conceivably be in extensions. Could you provide more details on what you want to use it for?

daniel added a comment.EditedApr 15 2019, 6:12 PM

@Simetrical sorry, yes. See https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502484/16/includes/Permissions/PermissionManager.php#1174. The proposed BlockManager also uses WebRequest, see https://gerrit.wikimedia.org/r/c/mediawiki/core/+/502820/5/includes/block/BlockManager.php#14, but it's passed in as a parameter. Which raises teh question where the caller gets it from, if not from global state.

There are other instances of code in need of servisification that needs a WebRequest, e.g. static function buildRollbackLink in Linker, and static function sendHeaders in ContentSecurityPolicy. Both of them (can) get the RequestContext as a parameter. That is theoretically the cleaner pattern, but would require us to pass the RequestContext (or RequestInfo or something) around pretty much everywhere. Given PHP's per-request execution model, it seems saner to just inject it into the service instances.

Oh, I just found static function getRequestId() in WebRequest, which is also called quite a lot...

Why not inject the WebRequest directly instead of putting it inside a wrapper that includes stuff about the user, language, and skin? I.e., what's wrong with just MediaWikiServices::getWebRequest()?

daniel added a comment.EditedApr 16 2019, 11:37 AM

Why not inject the WebRequest directly instead of putting it inside a wrapper that includes stuff about the user, language, and skin? I.e., what's wrong with just MediaWikiServices::getWebRequest()?

Two reasons: 1) it doesn't quite contain everything we want and 2) the wiring code needs to get it from somewhere - so it would have to be treated as a pseudo-service, with a getWebRequest method in MediaWikiServices.

Both issues can be solved, but those are the reasons I thought it would be better to wrap it in a service. But I'm open to injecting it directly. I just want a clear way to do this without resorting to global state.

By the way, public static function overrideRequestId( $id ) in WebRequest seems problematic... or at least, doesn't fit with the idea that services should not have setters or mutable state.

  1. Like what?
  1. Is there a problem with having getWebRequest in MediaWikiServices?
  1. Like what?
  1. The identity of the current User, and the things it provides access to. Since we want to avoid (publicly) binding to the User class, we'd need another way to do token checks and rate limits. Other things, like user settings, should be accessed via a separate service, given the user's identity. The language and skin preferences are an exception to that though, since they can be overwritten per request. The skin isn't so relevant, but having easy access to the effective user language via something like RequestInfo would be good.
  2. The identity of the requested page (as a LinkTarget, probably, to avoid binding to Title).
  1. Is there a problem with having getWebRequest in MediaWikiServices?

Not really, but if we need RequestInfo anyway to provide access to the title and user, there is no point in exposing WebRequest separately.

If the pieces of information are completely unrelated, why not expose them separately, like MediaWikiServices::getInstance()->getUserIdentity() and so on? What do we gain by bundling them in one unit? There are certainly tons of places that use $wgUser or RequestContext::getMain()->getUser() without having any need for a WebRequest, for instance.

Simetrical removed Simetrical as the assignee of this task.Aug 22 2019, 1:29 PM
Simetrical added a subscriber: Simetrical.
aaron added a subscriber: aaron.Mon, Aug 26, 3:46 PM

I'd love to have a simplified version of WebRequest as a service. One that would be useful for dealing with the issue that https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/532367/ is about. Optimization hacks like https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/526801/ could be avoided too. It could be injected with pathinfo/cookie settings, but would not deal with complex encoding stuff that uses $wgContLang and so on.

Perhaps the regular WebRequest could also be a service, which takes the simple one as a DI. OTOH, maybe the callers of web request fields should be dealing with the weird encoding stuff (how many use cases actually need this or can't be migrated not to?).

See also https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/532367/1/includes/ServiceWiring.php where I explain a few reasons for why making WebRequest a service might break some of the assumptions and principles we wanted to achieve with MediaWikiServices. Making it a server as-is might satisfy some cases where we feel that code looks "messy" otherwise, but it also hides the problem, I think. That is, I'm not opposed to the idea in general, I just think it's very unlikely to be possible to do in a way that doesn't break certain expectations. If we can do so within the parameters, then by all means :)

Should RequestInfo be aware of "sub" requests? i.e. should it know the "Master" request and the "Current" request? Or should there be a simple setRequest() method to override the current request? What if there is more than one sub request? Should there be a request "stack"?

Few services will actually want direct access to the request info, so I'd think it doesn't make sense to make a service for this without specific use-cases that don't make more sense to deal with by other means. For instance, LBFactory needs only a few specific pieces of information from the request, so it doesn't seem necessary to make a whole service to pass in for it.

I would expect that long-term, any information from the WebRequest can be processed in the entry points (index.php/MediaWiki.php, etc.) and passed down further only as parameter to methods and factories (either as whole object, or for specific scalar values from the request).

That would mean that:

  • services have to live at a higher-level where they may not depend on such information (incl. things like current Title, Skin or User),
  • service wiring constructs services neutrally in their default form which is always safe and is never wrong or outdated, suitable for use in use contexts without a user, title, or web request (job queue, CLI scripts, load.php, etc.).
  • entry points have to use their initial scope to take what they need from the global state and pass it down.

We have already made a lot of progress in this direction with the introduction of MW_NO_SESSION, which made it impossible to code things in a way that depend on wgUser / RequestContext::getUser. They can only be used if first ensures they are available, with a fallback for other contexts and/or to use parameters instead (which then make the code easier to re-use and test).

The direction of moving singletons into service wiring has also enforced this in many areas, which in some cases has shown that a class cannot be come a service due to variance on Title or User. Instead, for those cases we made a factory and exposed that as a service (thus requiring the caller to provide these at run-time with their own parameters, and then depend on it being passed down further - not available statically or globally).

wgUser and wgTitle, and their problems, are in my opinion, a proxy for WebRequest, which is where Title and User are ultimately derived from. Hence, WebRequest can no more become a service than "current user" or "current title" can be a service.

So let's say for LBFactory, how would you suggest making it not depend at all on the current request? How should it get the request's IP address and User-Agent so that ChronologyProtector works?

getRequestedLanguage(): get the code of the requested output language. Based on the user's preferences, but may be overwritten per request. Avoid binding against the heavy Language class, though.
getRequestedSkin(): get the name of the requested skin. Based on the user's preferences, but may be overwritten per request. Avoid binding against the heavy Skin class, though.

What's the benefit of not binding against Language/Skin? There should probably be some sort of global-ish way to instantiate these and share/reuse those instances, because having to call Language::factory( $context->getRequestedLanguage() ) in 100 places would be very annoying and also inefficient.

msg() - localization should be done elsewhere

This is more complicated than people tend to think: providing msg() requires a Language object, and knowing which one to use requires looking at the request (for ?uselang=xyz), the user (for their language preference), and the config (to find the content language, for session-less entry points and for Message->inContentLanguage()). If a MessageLocalizer is separate from a RequestContext, or composed into it, that's fine, but you'd need a bunch of request-context-y stuff to find the right information to construct a MessageLocalizer.

On a side note, the current MessageLocalizer interface provides a msg() method but not a getLanguage() method, which has resulted in "fun" adventures like rMWfe2e8795d74b: Follow-up 9c9cfa2ec3d7: fix non-session entry point error

@Krinkle has been arguing against this in a few places (see gerrit:532367 and gerrit:532775). I'll try to summarize, so we can continue the discussion here.

If I understand correctly, Krinke's point is that the behavior of a service should only vary on 1) the parameters provided and b) configuration. Injecting knowledge about the current Request, User, Session, etc breaks that assumption. So these things should be provided as parameters where needed.

In theory, I agree that this would be the cleaner architecture. However, in practice this means that these things have to be passed down the stack, as parameters on every method call, all the way from MediaWiki::run to every bit of code that currently accesses Request or Session. Everything that accesses User::getRequest() or User::pingLimiter(), everything that calls RequestContext::getMain() or IContextSource::getRequest().

That just doesn't seem feasible.

So, instead I propose re-examine the guarantees we want to give for sessions. My proposal is to say that the behavior of a service should only vary on 1) the parameters provided b) configuration and c) the current request and finally d) the state of the database. Actually, on a wiki farm, the current request actually decides which configuration is active, which database gets accessed, etc.

Note that currently, a lot of code depends (indirectly) on the global $wgRequest variable. With the architecture proposed here, they would be bound to the request and session in their ServiceContainer, which can be mocked like any other service, for testing. It would eventually also allow multiple ServiceContainers bound to different requests to coexist, just like we will eventually be able to have multiple ServiceContainers bound to different wiki databases.

Making request and session available as a pseudo-service seems acceptable conceptually, and provides a faster path away from relying on global variables. If we followed the idea that these things can only be passed in as method parameters, we'd be relying on global variables until this is done everywhere, and it would force us to change a lot of method signatures.

In essence: it's preferable to pass information derived from the current request as a parameter to a concrete method. But making this possible would require us to change the signature of a very large number of methods. And if some functionality would become dependen on the request, e.g. because we want to rate limit it or it requires a permission check, then we'd again have to change a number of method signatures.

Moving from using global variables to an injectable service bound to the ServiceContainer would already provide a lot of the same benefits, without the need to first find a way to refactor our code to let us pass in all the context we need. And it wouldn't prevent use from doing that either, where it is easy or where there is a strong need.

In practical terms:
PermissionManager::getUserPermissions currently relies on global variables via RequestContext::getMain(). I want to change that. Making the necessary context injectable is simple and would fix the problem. Making sure all code that wants to check user permissions has the necessary context injected (e.g. for checking poison cookies and whatnot) would be very difficult. Not to mention that most of that code probably shouldn't know or care how blocks work or whether they are needed for permission checks.

If I understand correctly, Krinke's point is that the behavior of a service should only vary on 1) the parameters provided and b) configuration. Injecting knowledge about the current Request, User, Session, etc breaks that assumption. So these things should be provided as parameters where needed.

For (1), you mean method parameters, not service construction parameters, right?

In theory, I agree that this would be the cleaner architecture. However, in practice this means that these things have to be passed down the stack, as parameters on every method call, all the way from MediaWiki::run to every bit of code that currently accesses Request or Session. Everything that accesses User::getRequest() or User::pingLimiter()

Taking pingLimiter() as a concrete example -- where does the caller get the User object from? They could get the IP address from the same place. I don't know how practical this is without trying to implement it; maybe you have a better idea of that.

everything that calls RequestContext::getMain() or IContextSource::getRequest().
That just doesn't seem feasible.

That doesn't seem obviously different from saying that getting rid of global usage is infeasible. Which I would have said it is, if you asked me ten years ago. :)

So, instead I propose re-examine the guarantees we want to give for sessions. My proposal is to say that the behavior of a service should only vary on 1) the parameters provided b) configuration and c) the current request and finally d) the state of the database. Actually, on a wiki farm, the current request actually decides which configuration is active, which database gets accessed, etc.

I'm not sure I see the big picture here well enough to be in favor of claiming to make any guarantees. At this point, we're still in transition and there are no guarantees about anything. Services still access global state all over the place.

Perhaps we can avoid taking a definitive stance for now. We can all agree that it's better to keep services independent of the current request as much as possible. In cases where it's not obvious how to do that, we can consider alternatives. If we come up with a specific case where no one has a practical alternative, we can just inject request info (however much is needed). This can be marked as a todo or fixme. Once we make more progress on getting rid of globals and static methods, we can reconsider whether a service container should be specific to a single request or not.

In practical terms:
PermissionManager::getUserPermissions currently relies on global variables via RequestContext::getMain(). I want to change that. Making the necessary context injectable is simple and would fix the problem.

I think injecting the result of getAllowedUserRights() in ServiceWiring would be an improvement in this case. I don't see any feasible alternative offhand. If anyone has one to suggest, I'm happy to hear it, but otherwise I think the change is good. Insofar as we have dependency on global state, it's far better to put it in ServiceWiring than in the service itself.

Krinkle added a comment.EditedWed, Aug 28, 2:11 PM

[..] In theory, I agree that this would be the cleaner architecture. However, in practice this means that these things have to be passed down the stack, as parameters on every method call, all the way from MediaWiki::run to every bit of code that currently accesses Request or Session. Everything that accesses User::getRequest() or User::pingLimiter(), everything that calls RequestContext::getMain() or IContextSource::getRequest().
That just doesn't seem feasible.

I would disagree, but I first want to clarify that it is not my intent to change the status quo. Most everything I brought up in these conversations is what I thought you and others would agree with. I would like to narrow down and understand what part of the problem we think differently about, and whether that's due to a misunderstanding or because things have changed over the years.

Regarding having to pass everything down everywhere, I know this sounds very daunting. But, I think it only feels that way. In practice:

  • It's not everything – How many fundamental concepts do we have that derive from a user-initiated web request? Only a handful, right? We've got the WebRequest (params, cookies, headers), the session/user derived from that, and output/title/skin derived from that for requests that have one (e.g. page views).
  • It's not everywhere –The subset of MediaWiki that both 1) cannot compute these and 2) cannot be given these by method parameter, almost always has a RequestContext already. So we've already paid the migration cost for this over the past years. They already have direct or nearby access to these concepts. The number of places where we currently need to call RequestContext::getMain() is quite small, and migrating these away seems quite doable (we've already done it in most other places).

Let me shift focus for a minute away from how easy I think it would be, but instead consider the alternative and the problems I think we want to avoid. Ultimately, what matters to me more is that we solve these problems, not that we solve it by "passing down". Other solutions welcome :)

  • Where is the responsibility for deciding whether the current PHP thread even has a certain concept?
  • For threads that do have it, where is the code for implementing that concept? E.g. where does the job queue say, this job the params say it is for user X. Where does RL say, my params dictate we are in language Y, skin Z. Would that all be a switch case in ServiceWiring relying on more global state? Or do we determine and pass from MediaWiki::run, Api, Rest, RL, and Job::execute.

If we encourage access to OutputPage through services (or global RequestContext), then how would OutputPage behave in load.php, rest.php, Special:Export, CLI etc.? They have no such thing. Same for User, Title, Skin etc.

My proposal:

  • The entry point for page views (MediaWiki::run) instantiates this object together with the RequestContext which is passed down to high-level handlers that need it, such as Action, SkinTemplate and SpecialPage. We effectively do this already since MediaWiki::run is where the first RequestContext-derived classes are instantiated. It's just that things like OutputPage are lazy-created right now.
  • Anything else would not have access to it, and should not/cannot need it. Anything used by the skin that needs to output is given the object by the skin, or returns something that the skin will process further. This isn't a change - it's what we do today already. Access to RequestContext::getMain() is relatively rare, we generally pass these things down. Is this considered a problem today? Is there an example of prod code using RequestContext::getMain() that we could not solve in a way you'd be happy with?

If it becomes a service:

  • Anything that needs to write to the output, or get info about a user, would call the service singleton and get the relevant object from there. Essentially embracing global state, virtually the same as back when we did global $wgOut, $wgUser everywhere. Does that mean we'll pretend REST requests and JobQueue runs are "fake" page views by the anon user 127.0.0.1? What is the alternative?
  • Could we have multiple contexts reliably interacted with during a single request? Could we avoid code that works today, but fails tomorrow in mysterious ways because it called a "bad" service? For example, a class used by the API and by a Skin, that uses the User service, will tomorrow want to queue a module, and get the OutputPage service. It would work fine when tested on a page view, but not work correctly in the API. Will the Output interaction be silently ignored? Or would we have stubs OutputPage/Skin/User objects that throw on all method calls unless the object is "enabled" by MediaWiki:run?

I don't feel particularly strongly about cross-wiki interactions being possible. The reason I bring it up is because I remember hearing about it when we first created MediaWikiServices and thought it was a motivation for it. I feel that MediaWikiServices seems to have very little purpose without these restrictions (just another global/singleton).

TK-999 added a subscriber: TK-999.EditedWed, Aug 28, 3:07 PM

I'd like to share some of my thoughts related to this dilemma 🙂

I think there is value in differentiating between code that depends on the WebRequest existing and code that depends on the current User/Skin/Title/Output existing. The WebRequest is merely a view into PHP's $_SERVER, $_GET, $_POST etc. superglobals—these superglobals are set even in a CLI context, so code that depends on the WebRequest only assumes that PHP is running and would work both in CLI and web contexts. By contrast, code that depends on a current User/Skin/Title/Output etc. value assumes that a web request exists with some set of values that prior code used to derive a User/Skin/Title/Output etc. from, and this _will_ fail in a CLI context unless hacks like

pretend REST requests and JobQueue runs are "fake" page views by the anon user 127.0.0.1? What is the alternative?

are introduced.

In https://gerrit.wikimedia.org/r/c/mediawiki/core/+/532367, which was my proposed approach to fixing T228895, we see that there are already services such as the DBLoadBalancerFactory whose behavior may be influenced by request parameters. Instead of injecting these request parameters via the service container, the code right now instantiates the service unconditionally on every request, and uses a setter method to inject them. This keeps request info out of the service container, but in turn makes the internal state of the service more complicated, more mutable, and potentially more brittle. For instance, if a dynamic configuration loading system accessed the database in the configuration phase, before the Setup.php execution, the DBLoadBalancerFactory would be instantiated and used but the request parameters would not be injected into it, which may have ramifications on its behavior.

What do you think?

  • It's not everywhere –The subset of MediaWiki that both 1) cannot compute these and 2) cannot be given these by method parameter, almost always has a RequestContext already. So we've already paid the migration cost for this over the past years. They already have direct or nearby access to these concepts. The number of places where we currently need to call RequestContext::getMain() is quite small, and migrating these away seems quite doable (we've already done it in most other places).

Concretely, PermissionManager::getUserPermissions() effectively calls RequestContext::getMain(). (It calls $user->getRequest(), which in this case just returns $wgRequest.) How would you propose migrating away?

I think Daniel's point is that there are actually a lot of cases like this.

If it becomes a service:

  • Anything that needs to write to the output, or get info about a user, would call the service singleton and get the relevant object from there. Essentially embracing global state, virtually the same as back when we did global $wgOut, $wgUser everywhere.

Not the same, because you could inject any user object you want. Even if there's some global PermissionManager that's based on the current session, you could still construct a PermissionManager that's based on a different session, because the session stuff would be injected.

In the concrete case of PermissionManager, ServiceWiring would inject a list of allowed user rights from the session. If this is a maintenance script etc., there are no user rights to inject, so it will inject null. This is the same as if the session doesn't restrict user rights, and everything should work perfectly.

Does that mean we'll pretend REST requests and JobQueue runs are "fake" page views by the anon user 127.0.0.1? What is the alternative?

Anything that needs an IP address injected would also have to account for a case where there's no IP address, probably expressed by passing null. They would need to account for this. For instance, IP-based rate-limiting would have to check for the case where there's no IP address and just not rate-limit.

  • Could we have multiple contexts reliably interacted with during a single request? Could we avoid code that works today, but fails tomorrow in mysterious ways because it called a "bad" service? For example, a class used by the API and by a Skin, that uses the User service, will tomorrow want to queue a module, and get the OutputPage service. It would work fine when tested on a page view, but not work correctly in the API. Will the Output interaction be silently ignored? Or would we have stubs OutputPage/Skin/User objects that throw on all method calls unless the object is "enabled" by MediaWiki:run?

How is this any different from any other case where a module will behave differently when different parameters are injected?

I think it's best to focus on concrete, case-by-case discussions. Let's figure out how to address PermissionManager::getUserPermissions(), say.

I feel that MediaWikiServices seems to have very little purpose without these restrictions (just another global/singleton).

The difference between globals and DI is the way the variable is accessed by the caller. Even if in all real-world code the injected variable will happen to be the same, it still has all the benefits of DI.

dmaza added a subscriber: dmaza.Wed, Aug 28, 5:24 PM

Tagging TechCom because the high level semantics of services and service containers, as well as the question of how we provide access to request-dependent information in the code base, are cross-cutting concerns. I don't currently think this needs an RFC, but TechCom should be aware of the discussion.

daniel moved this task from Inbox to Watching on the TechCom board.Wed, Aug 28, 8:28 PM

I just remembered another use case for which we'd want implicit knowledge of the request: Logging. It's really useful if the logger can provide about the request URL, client IP, and user session, etc. Passing this information to the logger as a method parameter isn't feasible.

I think it's best to focus on concrete, case-by-case discussions. Let's figure out how to address PermissionManager::getUserPermissions(), say.

Will do next. Meanwhile:

  • What are both your thoughts on the cross-wiki use case?

How might this work with WebRequest in the service container? Or does it not matter for that use case whether WebRequest is in or out the service container? Or should we depart from this long-term ideal? If so, what would be our strategy for use cases of that kind going forward? (E.g. HTTP requests to API only)

  • If we make Request a service - what would its contract be?

I believe today the contract for WebRequest (if received via RequestContext) is that if you have access to it, you can assume we're in a user-facing web request. Its cookies will be their cookies. Its IP will be their IP. One can set headers and write to the WebResponse. This means that in load.php, jobs, CLI - where there is no RequestContext - these are not true, and code can't do it accidentally as it won't be created/passed down here.

Specifically, how do we prevent mid-level code from calling Services->getRequest and assuming it's always about a user-facing request? Would we disable this service in JobQueue, DeferredUpdate, ResourceLoader, CLI etc.?

Will do next. Meanwhile:

  • What are both your thoughts on the cross-wiki use case?

How might this work with WebRequest in the service container? Or does it not matter for that use case whether WebRequest is in or out the service container? Or should we depart from this long-term ideal? If so, what would be our strategy for use cases of that kind going forward? (E.g. HTTP requests to API only)

As noted, anything that needs a bit of info from the request should have it injected. Thus everything should work fine for any wiki if you inject the correct things. Concrete examples so far have involved injecting the request origin IP address or specific headers, or injecting general request info for loggers. None of those seem to have any effect on use of services cross-wiki.

  • If we make Request a service - what would its contract be?

I don't think Request should necessarily be a service. Specific bits of information should be injected as needed for specific applications.

I believe today the contract for WebRequest (if received via RequestContext) is that if you have access to it, you can assume we're in a user-facing web request. Its cookies will be their cookies. Its IP will be their IP. One can set headers and write to the WebResponse. This means that in load.php, jobs, CLI - where there is no RequestContext - these are not true, and code can't do it accidentally as it won't be created/passed down here.

Is that actually true? load.php includes WebStart.php, which includes Setup.php, which sets up RequestContext::getMain() and $wgRequest. maintenance/doMaintenance.php also includes Setup.php, and also has a request context AFAICT. How is the status quo any worse than introducing a Request service and having everything use it willy-nilly? I don't think that should happen, but even if it does it's no worse than what we have today.

Specifically, how do we prevent mid-level code from calling Services->getRequest and assuming it's always about a user-facing request? Would we disable this service in JobQueue, DeferredUpdate, ResourceLoader, CLI etc.?

What do we do now? If we don't have special-case code, presumably we have a request object that reflects whatever PHP does for command-line code. I assume this includes an IP address of localhost, empty $_GET/$_POST/$_REQUEST/$_COOKIE, etc. Code that relies on this will either work properly, or not. This doesn't inherently change if the code gets this information from injection rather than global objects or directly from PHP superglobals.

Krinkle added a comment.EditedThu, Aug 29, 1:21 PM

What are both your thoughts on the cross-wiki use case? [..]

As noted, anything that needs a bit of info from the request should have it injected. Thus everything should work fine for any wiki if you inject the correct things. Concrete examples so far have involved injecting the request origin IP address or specific headers, or injecting general request info for loggers. None of those seem to have any effect on use of services cross-wiki.

Indeed. Things like Loggers and optimisations internally for LBFactory don't need to change or stay in sync when code tries to get some information about another wiki. They should remain true to the top-level purpose of the PHP thread. Singleton and global state can feast there as they wish, no problem :)

What I mean is - what is our strategy for constructing things like a RevisionStore, Parser or MessageCache for another wiki. If services only vary by wiki and their Config, it is logically trivial to construct another tree for that, so long as its config can be retrieved (SiteConfiguration, legacy, known issues, limitations etc. yes).

If we make Request a service - what would its contract be? [..].
[..] in load.php, jobs, CLI - where there is no RequestContext [..]

Is that actually true? load.php includes WebStart.php, which includes Setup.php, which sets up RequestContext::getMain() and $wgRequest. maintenance/doMaintenance.php also [..] How is the status quo any worse than introducing a Request service [..]

They exist, but are not meant to be used. The same way that wgTitle used to exist (but removed since not everything has a sensible value). And wgOut is similarly on the way out as well.

$wgRequest or RequestContext::getMain() usage is rare, considered an anti-pattern, and stands out as technical debt. Precisely because they are unsafe. Reaching into these is not what I meant with "a WebRequest received via RequestContext". Outside classes like Action, SpecialPage, Skin, OutputPage etc that implement RequestContext, it must not be used, and generally isn't afaik.

But, I believe we expect services to be safe to obtain and use. Any of its requirements are either handled by the service itself, or moved to the caller through mandatory parameters. Hence my question - what would its contract be? Would we say to never use it, like we do for $wgRequest or RequestContext::getMain() – except for the purpose of injecting into the kind of class that implements RequestContext currently? If so, why make it a (footgun) service if we only need it on 1 or 2 places throughout MediaWiki.

TK-999 added a comment.EditedThu, Aug 29, 2:06 PM

What I mean is - what is our strategy for constructing things like a RevisionStore, Parser or MessageCache for another wiki. If services only vary by wiki and their Config, it is logically trivial to construct another tree for that, so long as its config can be retrieved (SiteConfiguration, legacy, known issues, limitations etc. yes).

I'd like to understand the "cross-wiki" use cases more. My understanding is that making something like creating a Parser instance for another wiki a trivial enterprise would require a fundamental refactor of MediaWiki internals and extension points.

As a start, the DI guidelines discourage directly constructing service instances directly. So we would need to have multi-wiki aware factories for each of these services that would handle instantiating those service objects for a given wiki and obtain the necessary dependencies in the context of that wiki.

Let's assume that this framework is all in place. The next problem is that the behavior of these services are influenced by extensions via the hooks API, and the set of extensions on the remote wiki may be different from the set of extensions on the wiki that initiated the cross-wiki call. For instance, if I wanted to parse a page on a remote wiki that had a parser tag extension enabled on that remote wiki only, I would get incorrect output. Right now I do not see a sane way to initialise the required extensions only for such cross-wiki operations and tear down afterwards to prevent them from polluting the current wiki context.

My impression is that MediaWiki is still fundamentally a single-wiki software with very limited multi-wiki awareness. Systems such as SiteConfiguration, SiteStore, InterwikiLookup, WikiMap exist side-by-side and all implement different aspects of multi-wiki operations without much coordination. Other code, such as RevisionStore and friends use database domains for their flavor of multi-wiki aware code. And we have things like ForeignTitleFactory that seem to try to specifically work around a fact that it is not sanely possible to instantiate Titles for a remote wiki context.

It is not clear to me either what the way forward would be. T224020, T221535, T113034, T184529 and possibly other tasks seem to aim to explore solutions to various aspects of cross-wiki operations but I feel I am missing the overall architectural vision that ties them together. It is also an open question how such a system in core (if implemented) would work together with existing dynamic configuration management systems that most major wiki farms have had their own flavor of some time.

tl;dr I feel code that depends on the request / execution context of the current thread is not the only obstacle to implementing cross-wiki functionality :)

What I mean is - what is our strategy for constructing things like a RevisionStore, Parser or MessageCache for another wiki. If services only vary by wiki and their Config, it is logically trivial to construct another tree for that, so long as its config can be retrieved (SiteConfiguration, legacy, known issues, limitations etc. yes).

Services should only vary based on the parameters they're injected with. They should not vary based on anything else. (Except when the point of the service is to access external state, such as wrappers around DB/memcached/filesystem/etc.) So you inject whatever is needed for the service to work properly for the other wiki. If it needs info from the current request, you inject that too, or some faked substitute. It will depend on the use-case.

Again, concretely: we have so far mentioned ping limiting, LB coherency, and logging as things that need to know about the current request. In all of those cases, you could inject the request info just as well irrespective of whether you're dealing with this wiki or another one.

$wgRequest or RequestContext::getMain() usage is rare, considered an anti-pattern, and stands out as technical debt. Precisely because they are unsafe. Reaching into these is not what I meant with "a WebRequest received via RequestContext". Outside classes like Action, SpecialPage, Skin, OutputPage etc that implement RequestContext, it must not be used, and generally isn't afaik.

I don't think that's actually true. grep says 71 files in includes contain "RequestContext::getMain" or "$wgRequest". There are presumably many more that indirectly rely on the request globals. For instance, PermissionManager::checkUserBlock() uses RequestContext::getMain(). This is called by getPermissionErrorsInternal(), which is called by such methods as getPermissionsErrors() and userCan(). Thus we find pervasive use of request globals, albeit indirectly. The use of globals just hides it.

Again, concretely, what would you propose for the PermissionManager case if not injecting the appropriate request info in ServiceWiring?

But, I believe we expect services to be safe to obtain and use. Any of its requirements are either handled by the service itself, or moved to the caller through mandatory parameters. Hence my question - what would its contract be? Would we say to never use it, like we do for $wgRequest or RequestContext::getMain() – except for the purpose of injecting into the kind of class that implements RequestContext currently? If so, why make it a (footgun) service if we only need it on 1 or 2 places throughout MediaWiki.

I don't think it's even slightly accurate to say that only one or two places in MediaWiki need to depend on request info. I think so far we've come up with three or four, and there are likely to be many more. If you want to look through the 71 files in includes that use the global request and explain how really they don't need to do so at all, please feel free, but until then I'm assuming we do actually need to use it widely, directly or indirectly.

[..] Would we say to never use it, like we do for $wgRequest or RequestContext::getMain() – except for the purpose of injecting into the kind of class that implements RequestContext currently? If so, why make it a (footgun) service if we only need it on 1 or 2 places throughout MediaWiki.

I don't think it's even slightly accurate to say that only one or two places in MediaWiki need to depend on request info. [..]

I am referring to one or two places in MediaWiki that would authoritatively have the responsibility to first construct a WebRequest object, e.g. which could do it inline vs moving the construction call to ServiceWiring.

I'd like to understand the "cross-wiki" use cases more. My understanding is that making something like creating a Parser instance for another wiki a trivial enterprise would require a fundamental refactor of MediaWiki internals and extension points.

It's mostly a red herring in this discussion. There is a CPT Initiatives (Cross-Wiki (CDP2)) initiative, but it's not fleshed out yet. But we are indeed working on that fundamental refactoring. The basic idea is that you'd have a MediaWikiServices instance for each wiki - only the ServiceContainer needs to be cross-wiki aware, the individual services do not. I'm positive I wrote down my thoughts on this somewhere, but I can't find the ticket. Story of my life...

For this ticket here, what Simetrical just said is much more to the point: a lot of functionality we use throughout the code depends on the session. To ping limiting, LB coherency, and logging I wold add permission checks. I'd say most of your code does, indirectly.

I see no reason why it would be bad to inject the request and/or session into services.

I am referring to one or two places in MediaWiki that would authoritatively have the responsibility to first construct a WebRequest object, e.g. which could do it inline vs moving the construction call to ServiceWiring.

Construct it inline from what? Global state, like $_REQUEST?

Also, once it is constructed - how does it get to all the places where it is needed? Passing it down as a parameter doesn't work. Not for things like logging.

Permission checks and ping limits could be done if the request was somehow bound to a user object. We usually pass that in. We'd then have "simple" UserIdentities, and would add the concept of an ActingUser, which would provide access to session and IP. I kind of like that idea, but I don't think it's sufficient, and potentially unsafe.

I still think that some services need to have knowledge derived from the current request injected. How else would you handle the logging use case?

@daniel, thanks for sharing the context about the cross-wiki initiative—much appreciated 🙂

I am referring to one or two places in MediaWiki that would authoritatively have the responsibility to first construct a WebRequest object, e.g. which could do it inline vs moving the construction call to ServiceWiring.

Construct it inline from what? Global state, like $_REQUEST?
Also, once it is constructed - how does it get to all the places where it is needed? Passing it down as a parameter doesn't work. Not for things like logging.
Permission checks and ping limits could be done if the request was somehow bound to a user object. We usually pass that in. We'd then have "simple" UserIdentities, and would add the concept of an ActingUser, which would provide access to session and IP. I kind of like that idea, but I don't think it's sufficient, and potentially unsafe.
I still think that some services need to have knowledge derived from the current request injected. How else would you handle the logging use case?

For the permission checks and ping limits, I feel it would be more natural to use services that expose method(s) accepting an user identity value that the permission check / ping limit would then be performed on. A permission check may be conducted against an user that is not the current session user, so I feel that the session user should not be injected here.

I'd like to circle back to my previous comment here: I think we should consider treating the question of whether web request values should be injected separately from whether its derivatives should be injected. Injecting the request or its parameters will work in both CLI and web contexts; in CLI we already use FauxRequest for sanity, but instantiating a WebRequest there would simply result in a "blank" request with no query params, headers etc. By contrast, injecting the session user, the current title, the current skin etc. assumes that the request is populated with certain values that allow this value to be derived there, and these would have to be mocked/stubbed in a CLI context with a "fake" title, an 127.0.0.1 user and such, which makes me feel that injecting these would bring more trouble than value.

daniel added a comment.EditedThu, Aug 29, 5:33 PM

For the permission checks and ping limits, I feel it would be more natural to use services that expose method(s) accepting an user identity value that the permission check / ping limit would then be performed on. A permission check may be conducted against an user that is not the current session user, so I feel that the session user should not be injected here.

Absolutely, but: permission checks for the "acting" user have to be performed differently, because they take into account whether the user is acting from a blocked IP. That's the reason PermissionManager accesses the request via global state, which prompted me to file this ticket. (Even worse: failed block checks are "contagious", and can cause more IPs to be blocked, and "poison cookies" to be emitted - so we'd also need a way to influence the response headers).

Ping limits can be done per user, but we may also limit activity per IP or IP range. So that info would need to be looped through somehow. Something like if ( $user instanceof ActingUser ) doesn't sound too bad to me, if we can somehow make sure that the acting user is never represented by a plain UserIdentity.

For context for logging, I really don't see an alternative to putting that context into the Logger. As a string for all I care, but it has to be injected somehow.

I'd like to circle back to my previous comment here: I think we should consider treating the question of whether web request values should be injected separately from whether its derivatives should be injected.

We have to answer the fundamental question of whether Services are allowed to depend on anything that derives from the request. Everything else is details.

By contrast, injecting the session user, the current title, the current skin etc. assumes that the request is populated with certain values that allow this value to be derived there, and these would have to be mocked/stubbed in a CLI context with a "fake" title, an 127.0.0.1 user and such, which makes me feel that injecting these would bring more trouble than value.

That is already the case, and has been for as long as MediaWiki has existed, as far as I know. Since a lot of code depends $wgUser, $wgTitle and friends, they get faked for CLI mode. We could take this opportunity to try and change that, and make that code be able to live without these fake values, but that's really beyond the scope of this discussio0n.

  • What are both your thoughts on the cross-wiki use case?

How might this work with WebRequest in the service container? Or does it not matter for that use case whether WebRequest is in or out the service container? Or should we depart from this long-term ideal? If so, what would be our strategy for use cases of that kind going forward? (E.g. HTTP requests to API only)

For the cross-wiki use case, it does not matter whether we inject the WebRequest, or loop it through via parameters. Either approach will work.

  • If we make Request a service - what would its contract be?

I believe today the contract for WebRequest (if received via RequestContext) is that if you have access to it, you can assume we're in a user-facing web request. Its cookies will be their cookies. Its IP will be their IP. One can set headers and write to the WebResponse. This means that in load.php, jobs, CLI - where there is no RequestContext - these are not true, and code can't do it accidentally as it won't be created/passed down here.

I don't think WebRequest itself should be a service. And perhaps WebRequest as it exists today shouldn't be a service.

Specifically, how do we prevent mid-level code from calling Services->getRequest and assuming it's always about a user-facing request? Would we disable this service in JobQueue, DeferredUpdate, ResourceLoader, CLI etc.?

Here's a quick draft of a service interface, off the top of my head:

ExecutionContext:
  getMode(): string; // "CLI" or "HTTP" or "JOB", etc
  getActingUser(): UserIdentity or null;
  getClientIP(): InternetAddress or null;
  getSessionId(): string or null;
  getRequestedTitle(): LinkTarget or null;
  getWebRequest() or null: 
      getRequestedURI();
      getHeader();
      getCookie();
  getWebResponse() or null: ;
      setCookie();

So, there would be a SessionInfo service. It would return null from most methods in CLI or JOB mode. Perhaps Jobs should get a ExecutionContext injected, which could be used to force the acting user to be the user who originated the job. But maybe that's not needed.

I'm really unhappy about allowing any access to the response, but it seems like it's needed for things like magic contagious blocks that use poison cookies.

Note that having such a service doesn't prevent us from explicitly injecting a Request or Response or ExecutionContext where possible and desirable. And that injected context could be different from the "root" context derived from the actual request.

kostajh added a subscriber: kostajh.Tue, Sep 3, 5:14 PM
daniel closed this task as Declined.Tue, Sep 3, 6:52 PM
daniel updated the task description. (Show Details)

After the discussion on this ticket anda call with @Krinkle, I decided to go for another approach: T231930: Introduce ActingUser to represent the user performing the current request