In order to move MediaWiki towards the use of Dependency Injection (DI) as proposed in T384, it will be necessary to provide a composition root (aka Service Locator or DI container, which will serve as the application scope or top level factory) that provides access to the top level services. This role should be taken by suitable DI container.
During the discussion of T384 as well as the relevant discussions during the 2016 developer summit, a rough consensus emerged that we should move towards DI if it can be done nicely and without too much disruption. This RFC aims to clarify the implementation details of the heart of the DI mechanism: the central service locator.
- we want to use service ”singletons” with narrow interfaces, and ”dumb” value objects for our domain model. For the storage layer, this means a DAO approach instead of ”smart” records.
- we do not want to use auto-wiring. It's convenient for small systems, but confusing (and potentially slow) for large, complex systems like MediaWiki. We want to avoid magic, and do things explicitly.
- we want our DI container to be configurable, so that extensions can define their own service. Extensions should also be able to replace or wrap existing services.
- we do not want to use a 3rd party DI framework such as Pimple or PHP-DI. See below for the rationale.
- we prefer services to be injected as constructor argument to avoid strong coupling. As a migration path, it's OK for existing code to follow the service locator pattern, that is, access global default service instances via static methods.
Side note: there is no intrinsic distinction between a Service Locator and a DI Container. The difference lies in how they are used: the default Service Locator instance is accessed directly by code that needs a service, much like what MediaWiki does now, just with a single place for creating and tracking service instances. To follow the dependency injection pattern, access to the service locator would be restricted to static entry points. Service implementations that follow the DI pattern should not know about the Service Locator at all.
Proposal
This RFC proposes the implementation of a generic service locator class that instantiates services on demand based on a registry of callback functions ("wiring"). A top level instance of this service locator, namely MediaWikiServices, is used to access well known top level services. The top level services are initialized from a wiring file that defines a callback for creating each well known service. Extensions can add their own wiring for additional services, override existing service wiring, or wrap services.
Motivation
The move towards dependency injection is threefold:
- Testing: With DI, all dependencies of a class can easily be mocked, so it becomes possible to write proper unit tests that test a class in isolation. This means no global state is touched, preventing interference between tests. It also means that writing test setups becomes a lot easier, so it becomes easier to improve the coverage of different code paths. Also, proper unit tests means a bug will show up only in the test for the code where it is actually located, and will not make other tests fail.
- Generalization: DI allows re-use of code in a different context or with different configuration. For example, it is currently not possible to load a page from a different wiki, due to the fact that the database connection, namespace configuration, etc are all bound to the local wiki, and there is no way to create a context to load (much less parse!) a page from another wiki, even if the database resides on the same server. With dependency injection, it would be püossible to construct a MediaWikiServices instances using a different Config object that represents the configuration for another wiki - and load and parse pages from the other wiki copnsistently.
- Modularization: DI makes it easy to track which classes have which dependencies, and it avoids strong coupling, resuding dependencies between implementations. This means that it becomes possible to extract parts of MediaWiki core, such as the storage layer or the parser infrastructure, into a separate component. That component could then be used in another application to access or render MediaWiki content.
For some more discussion on if and why we want dependency injection, see T384: RfC: Dependency Injection for MediaWiki core.
Status
An implementation of the above proposal is up on Gerrit for review. Some issues were encountered when experimenting with the new mechanism. These issues and their solutions are discussed in a later section.
- I3c25c0ac17300d3dd13e1cf5100558a605eee15f introduces MediaWikiServices as the service locator, and implements the necessary bootstrapping logic. See also docs/injection.txt for a more detailed description.
- I7d7424345d0ce3ce90ba284006ee9615e3d99baa is an experiment that showcases how to refactor a static service class (Interwiki) to use the new mechanism.
- Ie5b3bdee67ab896384a8840258477beb0e139945, Iaeb7c418af17fe19f170487f5364040da6052699, and I72e4800ad58fe964d5eef5772220aa8c528d31f0 are experiments that showcase how such a refactoring enables dependency injection. These examples should be extended to also show how this is useful for writing unit tests.
- I587fd51a7b8cfcf7a673e8eeb3d55f721ce19b47 introduces a mechanism for extensions to decorate (wrap) existing services.
- Ie98bf5af59208f186dba59a9e971c72ea0b63e69 adds a mechanism to reset the service locator, forcing all services to be re-created. The rationale for this is explained in a latter section.
- I37b8e8018b368e098892ea9d26b5f96f7c411b5a is not directly related to introducing a service locator, but is needed for the next step.
- Iab2b774626cf062d65a38f0c261173aad00cc8e4 is an experiment that makes the database layer (LoadBalancer, mostly) injectable. This is work in progress, but reviews are welcome. This is probably the most challenging transition in the code base, but is starting to look good. See below for some of the issues encountered while working on this.
- Ia91629087e37ffc6da752a72be13d8636225a1c4 is an experiment that removes all global state from the basic storage layer services, refactoring them to use dependency injection. This is significant since the database abstraction code is among the most tricky to refactor for DI.
See also the "Migration" section in docs/injection.txt to get an impression how existing MediaWiki code can slowly be moved towards making use of dependency injection.
Challenges and Questions
Wiring and Extension Defined Services
One question is how the wiring should be defined, and how extensions can provide wiring for their own services, or change the wiring for existing services. The proposed implementation uses the following approach:
- wiring is defined in a PHP file (ServiceWiring.php) that contains an array that maps service names to callbacks (closures). Extensions can add additional mapping files to the $wgServiceWiringFiles array to define their own services. This way, loading the service wiring makes use of the opcode cache.
- Some DI frameworks use factory classes to define the wiring for services. That however means more overhead (all these classes would need to be loaded and instantiated for every request, before doing anything). Also, it's more of a maintenance hassle.
- One wiring file cannot redefined services defined in another wiring file. This is to make naming conflicts between extensions obvious. Extensions that want to redefined or decorate (wrap) a service can do so using the MediaWikiServices hook.
- Services that have already been instantiated (rather than just defined) cannot be redefined.- Allowing this may lead to inconsistencies, if references to the old service instance remain in other services.
Resetting the Service Locator
One question is how an existing service instance can be reset, e.g. because relevant configuration has changed. Just creating a new service instance would lead to inconsistencies, since references to the old service instance could remain in other services.
The proposed implementation provides a way to reset all services at once by calling resetGlobalInstance. For consistency, this includes resetting services that are not yet managed by the service locator.
Resetting all services just because one service needs resetting may seem overkill, and a performance problem. However, this should not happen during normal operation (with one notable exception, see the section about Accessing Services During Initialization). There seem to be three cases in which all services need to be reset:
- during the initialization process, when the user supplied configuration for database access becomes available. Some of the old-style singletons where already reset for this.
- during testing. Proper unit tests shouldn't rely on the global service instances, but integration tests (which means pretty much all tests currently in the MediaWiki core repo) may have the need to use specific service instances, or to clean up "dirty" services after testing.
- after being forked from a parent process. Some maintenance scripts use multi-processing to parallelize some tasks. Global service instances cannot just be copied, they need to be re-created for each child process, so they don't share critical resources such as database connections. Some of the old-style singletons where already reset for this, but not consistently (different maintenance scripts would reset different services).
A single method to reset all global service instances seems useful, and the overhead not problematic given that it does not really happen while serving web requests.
The alternative to always resetting all services would be to track all dependencies for each service. Dependencies would have to be explicitly defined in the wiring file (since we don't want to use auto-wiring). That way, when we want to reset service X, we can find all services that depend on X directly or indirectly, and also reset them.
If we want to go down that route, we should probably use a 3rd party library that supports this (does PHP-DI?), and wrap that in MedaWikiServices. But declaring all dependencies in the wiring file is a maintenance nuisance, and this wouldn't work at all for legacy services that are not managed within MediaWikiServices.
So the crude-but-simple solution seems to be fine for now. Especially since nothing keeps us from introducing more fine grained resets based on dependency tracking later on.
Accessing Services During Initialization
An issue arises when trying to access services before all configuration is known and all extensions are loaded. Ideally, there would be no access to any service until initialization is complete. However, at least the extension loader needs access to the cache manager, and the cache manager needs access to the config, which comes from the config factory service. Also, existing extensions may be accessing service singletons during initialization. We don't want to break them without need.
The proposed implementation solves this by allowing access to all services during initialization, and then resetting (dropping) all service instances created during that time. If only a hand full of simple service objects were needed, this reset does not have a big performance impact. However, if some of those services need to open network connections (e.g. to the database or memcached), this can become problematic.
Some options we could explore:
- Allow services to declare that they do not need to be re-created when configuration changes, perhaps by using a marker interface. Such services would need to use an entire Config object, or some kind of "promise" object, to keep track of changing configuration.
- Blacklist some services from being accessed during initialization. One way to do this is to have a separate service locator for "primordial" services, with a separate wiring file. This assumes that services that were not blacklisted can be kept alive through config changes. This is tricky, since it's already not true for the ConfigFactory (extensions can register new configurations), and pretty much everything depends on using configuration.
To access the need and impact of access to services during initialization, such access should be tracked. The easiest way to do this is to log the list of instantiated services at the point where the current implementation resets the service locator.
Forcing Services During Testing
During testing, we need more control over service instances. For instance:
- for the database mocking mechanism to work, all tests need to use a single special database connection to which the temporary table clones are bound. When resetting services, the new service instances have to know and use the old database connection, and that connection must not be closed when the old service locator instance is destroyed.
- we need a quick way to reset caches between tests.
- we need a way to swap in an entire service locator for a single test, then restore the old one.
- we need a way to put a specific service instance into place for a single test, and undo this later.
Doing these things has potential for creating inconsistencies. However, that seems ok in the context of integration tests, where we have to care about setting up and taring down fixtures in global scope or the database anyway.
To allow the level of control described above, the proposed implementation provides some methods that are only usable during testing, such as forceGlobalInstance and resetServiceForTesting (these should probably be moved to a subclass of MediaWikiServices). Iab2b7746 does some extensive re-engineering of MediaWikiTestCase to make this work smoothly, and allow most existing tests to continue working without modification. This however is work in progress.
Notes
Why not use a 3rd Party DI Framework
One of the recurring discussion topics in this context is whether we should use a 3rd party DI framework. A number of decent DI frameworks available for PHP, like Symfony DI, PHP-DI, and the more light weight Pimple. So why not use them?
There are several reasons for using our own, instead of relying on a 3rd party framework:
- Overhead. Using a 3rd party library means overhead for maintaining the dependency. The overhead isn't huge, but it's clearly non-zero. It should only be done if the benefits are clear.
- Framework isolation. We should avoid multiple components (and extensions) depending on a 3rd party framework. Instead, we should at least provide a thin layer to insulate the framework we use from MediaWiki code, so we can easily change the framework later. So even if we end up using a 3rd party lib, we still need to write and maintain the wrapper code anyway.
- The lib does too much. The basic functionality of the DI container is rather simple. Some of the available frameworks do way more than we need or want (auto-wiring, for instance). We would drag in weight, and would be open to new code using aspects of DI we don't want to use (such as auto-wiring, or the framework's built-in configuration mechanism instead of MediaWiki's).
- The lib does too little. If we use a 3rd party framework but end up implementing half of the features we need on top of it, it may not be wroth the extra dependency. For example, at least during the migration period, we need a mechanism to replace existing services, and run cleanup when an existing service is replaced. Also, we need integration with MediaWiki's configuration mechanism. Pimple allows us to do this somehow, but doesn't really help with doing it.
- Performance. MediaWiki has pretty unique requirements for scalability and performance, in particular for startup time. For MW, the tradeoff point between speed and memory consumption leans a lot further towards using more memory than is the case for most other projects. Having control over these aspects of the implementation of the DI container is helpful.
So, in essence, if we only save a few hundred lines of trivial code by using a 3rd party library, we probably shouldn't. And if we take care to apply framework isolation, we can start or stop using a 3rd party framework internally at any time, without breaking anything. But if we only use it internally, how useful is it?