Page MenuHomePhabricator

Add a MergeHistoryFactory and convert MergeHistory to DI
Closed, ResolvedPublic

Description

@daniel suggested at https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/584247/ that a MergeHistoryFactory injecting dependencies to the MergeHistory class would be a good idea

The class currently uses:

  • A load balancer / database access (currently done via wfGetDB)
  • The ContentHandlerFactory service
  • The PermissionManager service
  • The WatchedItemStore service

And after reducing use of Revision objects in the patch linked above, the RevisionStore service

Event Timeline

DannyS712 created this task.Apr 5 2020, 1:30 PM
Restricted Application added a project: User-DannyS712. · View Herald TranscriptApr 5 2020, 1:30 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
DannyS712 moved this task from Unsorted to Next on the User-DannyS712 board.Apr 5 2020, 1:30 PM
Anomie added a subscriber: Anomie.Apr 6 2020, 1:29 PM

MergeHistory seems reasonably likely to only ever be used in two or three places: SpecialMergeHistory, ApiMergeHistory, and maybe someday a MW REST API endpoint. Does it really need a factory service, rather than being a helper intended for composition with the dependencies passed through from the special page, Action API module, and REST API handler?

daniel added a comment.Apr 6 2020, 2:09 PM

MergeHistory seems reasonably likely to only ever be used in two or three places: SpecialMergeHistory, ApiMergeHistory, and maybe someday a MW REST API endpoint. Does it really need a factory service, rather than being a helper intended for composition with the dependencies passed through from the special page, Action API module, and REST API handler?

"Static" composition, with the constructor being called directly by another constructor, is an option, but not great for testability.

Another option is for SpecialMergeHistory, ApiMergeHistory, etc to expect an instance as a constructor parameter, and leave it to the wiring code to directly instantiate it. However, this only works if the instantiation is done programmatically by a factory function, not from a JSON structure. This would be possible for SpecialMergeHistory and ApiMergeHistory, since their wiring is defined in PHP. For a REST endpoint, construction would have to be delegated to a static factory method, which would ideally have the service container injected. Not sure we presently support that.

Anomie added a comment.Apr 6 2020, 2:26 PM

"Static" composition, with the constructor being called directly by another constructor, is an option, but not great for testability.

I can think of a few ways to work around that:

  1. Have a protected "createMergeHistory" method that wraps the new MergeHistory call for the constructor. Then you can mock that method in your tests to supply a custom instance.
  2. Use TestingAccessWrapper to overwrite $obj->mergeHistory after constructing the class.
  3. Have a test-only "setMergeHistory". It could even be private if you don't mind using TestingAccessWrapper to make the call to it from the tests.
  4. Have the constructor take an optional argument (intended for testing only) to supply the MergeHistory.

For a REST endpoint, construction would have to be delegated to a static factory method, which would ideally have the service container injected. Not sure we presently support that.

We support a static factory method. You could inject the ObjectFactory into it to get indirect access to any needed services from the container.

{
    "path": "/whatever",
    "factory": "MediaWiki\\Rest\\Handler\\WhateverHandler::factory",
    "services": [
        "ObjectFactory"
    ]
}

Conceptually MergeHistory looks exactly the same as MovePage - it's a class lying one level up from storage layer, made to do a complex task consisting of a collection of individual storage operations.

We already have a MovePageFactory, so adding a MergeHistoryFactory we're going to be setting up the pattern for classes like this. I donno how many more classes like this we have. If that's a lot, we should probably invent something more complex then adding a factory per class, if so far it's not a lot, I think we can live with interface-per-utility class and share an implementation or merge them all together into a ComplexStorageOperationFactory (name is horrible) interface. At least MovePage and MergeHistory seem to have very similar dependencies.

Anomie added a comment.Apr 6 2020, 4:37 PM

We should have a lot more than we currently do. Probably the majority of special pages and Action API modules should be sharing more code than they do, or should have a backend class instead of one calling into the other.

DannyS712 added a comment.EditedApr 6 2020, 5:03 PM

Conceptually MergeHistory looks exactly the same as MovePage - it's a class lying one level up from storage layer, made to do a complex task consisting of a collection of individual storage operations.

We already have a MovePageFactory, so adding a MergeHistoryFactory we're going to be setting up the pattern for classes like this. I donno how many more classes like this we have. If that's a lot, we should probably invent something more complex then adding a factory per class, if so far it's not a lot, I think we can live with interface-per-utility class and share an implementation or merge them all together into a ComplexStorageOperationFactory (name is horrible) interface. At least MovePage and MergeHistory seem to have very similar dependencies.

MovePageFactory includes all of the dependencies needed for a MergeHistory as well as a MovePage. Perhaps something like:

interface MovePageFactory {
	public function newMovePage();
}
interface MergeHistoryFactory {
	public function newMergeHistory();
}
class ComplexStorageOperationFactory implements MovePageFactory, MergeHistoryFactory {
	...
}

Some key points in favor

  • MediaWikiServices::getMovePageFactory would still return a MovePageFactory by returning ComplexStorageOperationFactory, so this would not be a breaking change
  • MediaWikiServices::getMergeHistoryFactory would be added, and return a MergeHistoryFactory by returning ComplexStorageOperationFactory
  • Easily scaled: add a new interface with a few lines, add an implements, add a method
  • Shared use of dependencies - all are injected into the base factory and then passed to the classes, without needing a new factory for each

Name is up for discussion - something like CommonSharedFactory to make it clear that it is a factory for multiple classes?
This would follow the same pattern as MediaWikiTitleCodec implementing both TitleFormatter and TitleParser

I made a proof of concept at https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/587314/ that adds a ComplexStorageOperationFactory implementing a MergeHistoryFactory interface and a MovePageFactory interface. Some issues:

  • If MergeHistory ever needs config values injected, there will be issues with the standard $options->assertRequiredOptions( self::CONSTRUCTOR_OPTIONS );, because the different interfaces will need different options, and there isn't an easy way to check which one is desired
  • The tests using FactoryArgTestTrait only cover half of the combined factory, and it only works because one of the objects being created uses all of the available dependencies. If MergeHistory needs a service that MovePage doesn't, it will no longer be possible to use the FactoryArgTestTrait test
  • Since MergeHistory accepts an option parameter in its constructor, its awkward to require that the services be passed

In short, I don't think a shared factory is going to work

...I tried again following @daniel's suggestion to consider injecting the whole MediaWikiServices object:
https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/587602/ adds a SharedFactory service, and converts 2 existing factories as a proof of concept

  • MovePageFactory and LinkBatchFactory become interfaces
  • SharedFactory implements both interfaces
  • SharedFactory::newMovePage and SharedFactory::newLinkBatch handle all dependency inject, as well as accepting the needed parameters to pass
  • To add a new factory (or convert an existing one) (here the ClassNameFactory creates ClassName objects with the newClassName method):
    • Copy the newClassName function from the old factory to SharedFactory
    • Convert ClassNameFactory from a class to an interface
    • Have SharedFactory implement ClassNameFactory
    • In ServiceWiring, for ClassNameFactory simply return $services->getService( '_SharedFactory' ) rather than needing to handle dependencies
    • In the SharedFactory::newClassName method, replace uses of stored services $this->foo with retrieval from the stored instance of MediaWikiServices $services->getFoo()
    • Replace the use of a stored ServiceOptions object with the construction of one using $services->getMainConfig() and ClassNameFactory::CONSTRUCTOR_OPTIONS

Thoughts on creating a shared object factory to allow for such creations, rather than needing to add a new factory each time?

daniel added a comment.Apr 9 2020, 9:56 PM

I have lately been thinking about structuring our codebase not just into layers, but also by domain. It just occurred to me that we may want to have one shared factory for the "verbs" belonging to one "domain". So in the "page management" domain, you may have a factory for things like MovePage and MergeHistory, while in the "user management" domain you would have a factory for things like BlockUser and RenameUser, etc. This way, we'd have shared factories that avoid boiler plate, but we also avoid ending up with a giant uber-factory that knows about everything.

We have discussed it a bit more, and having the command classes (MovePage, MergeHistory, etc) is a good thing to have going forward.

Whether to have a shared factory for all of the command classes or a factory class per command: the downside of having a factory per command is the amount of boilerplate. However, this makes it possible for each factory to only depend on the specific dependencies of the particular command class. Shared factory however, might eventually require too many dependencies.

However, we don't need to decide on this right now. Instead, we can specify a separate factory interface per command class, and have a shared factory implementing all of those interfaces. So far we would have two - MergeHistoryFactory and MovePageFactory. As we expand this pattern to more command classes, we would be free to split implementation into more and more per-domain shared factories.

So, for now, here's what needs to be done:

  1. Convert MovePageFactory into an interface. Given that it's not instantiated out of service wiring, we can do that without deprecation.
  2. Introduce MergeHistoryFactory interface.
  3. Create CommandClassFactory (name TBD) that would implement both interfaces. Inject all dependencies explicitly, not by injecting the service container.
  4. In ServiceWiring - define 'MovePageFactory' and 'MergeHistoryFactory' services, but provide a shared implementation.

We have discussed it a bit more, and having the command classes (MovePage, MergeHistory, etc) is a good thing to have going forward.

Whether to have a shared factory for all of the command classes or a factory class per command: the downside of having a factory per command is the amount of boilerplate. However, this makes it possible for each factory to only depend on the specific dependencies of the particular command class. Shared factory however, might eventually require too many dependencies.

However, we don't need to decide on this right now. Instead, we can specify a separate factory interface per command class, and have a shared factory implementing all of those interfaces. So far we would have two - MergeHistoryFactory and MovePageFactory. As we expand this pattern to more command classes, we would be free to split implementation into more and more per-domain shared factories.

So, for now, here's what needs to be done:

  1. Convert MovePageFactory into an interface. Given that it's not instantiated out of service wiring, we can do that without deprecation.
  2. Introduce MergeHistoryFactory interface.
  3. Create CommandClassFactory (name TBD) that would implement both interfaces. Inject all dependencies explicitly, not by injecting the service container.
  4. In ServiceWiring - define 'MovePageFactory' and 'MergeHistoryFactory' services, but provide a shared implementation.

What should be done with CONSTRUCTOR_OPTIONS constants - where should they be defined?

They are not a part of the interface, so they shouldn't be a public part of the interface. Perhaps they could be protected consts in the interfaces?

The shared implementation can expose a static method that would merge the lists together so that we could inject ServiceOptions instead of injecting a config.

daniel added a comment.EditedApr 16 2020, 10:12 PM

They are not a part of the interface, so they shouldn't be a public part of the interface. Perhaps they could be protected consts in the interfaces?

I don't think they should be in the interface at all. They are specific to an implementation. Wherever you define a constructor, you can defined CONSTRUCTOR_OPTIONS. And only there.

To avoid making this too complicated, I'll split the patches

  • Part 1 - convert MovePageFactory to an interface, add the new factory implementing the interface
  • Part 2 - add MergeHistoryFactory interface, add it to the new factory

Change 589455 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/core@master] Convert MovePageFactory to interface, implement in PageHandlerFactory

https://gerrit.wikimedia.org/r/589455

DannyS712 moved this task from Next to In progress on the User-DannyS712 board.Apr 17 2020, 1:38 AM

To avoid making this too complicated, I'll split the patches

  • Part 1 - convert MovePageFactory to an interface, add the new factory implementing the interface
  • Part 2 - add MergeHistoryFactory interface, add it to the new factory

@daniel @Pchelolo should I start work on part 2 or wait until part 1 is merged and verified to be working properly (i.e. do you want to see a preliminary part 2 before merging part 1)?

Also, I named the common factory PageHandlerFactory, because it constructs things that handle pages (MovePage and MergeHistory) but if others prefer a different name its pretty easy to change

@daniel @Pchelolo should I start work on part 2 or wait until part 1 is merged and verified to be working properly (i.e. do you want to see a preliminary part 2 before merging part 1)?

You can stack commits on top of each other if you want

Change 589455 merged by jenkins-bot:
[mediawiki/core@master] Convert MovePageFactory to interface, implement in PageHandlerFactory

https://gerrit.wikimedia.org/r/589455

Change 593080 had a related patch set uploaded (by DannyS712; owner: DannyS712):
[mediawiki/core@master] Add MergeHistoryFactory

https://gerrit.wikimedia.org/r/593080

{{ping}} @daniel I responded to your last review - can you take another look? Ideally this would merge before the replacements in T251015: Move spam regex checks out of EditPage, to avoid adding another dependency that is accessed via MediaWikiServices::getInstance(), those as noted on that task if this merges after that it won't cause any issues from a technical standpoint.

Change 593080 merged by jenkins-bot:
[mediawiki/core@master] Add MergeHistoryFactory interface, implemented by PageCommandFactory

https://gerrit.wikimedia.org/r/593080

DannyS712 closed this task as Resolved.May 13 2020, 9:29 PM
DannyS712 removed a project: Patch-For-Review.
Aklapper removed a subscriber: Anomie.Oct 16 2020, 5:38 PM