Problem
There is existing and new code we currently have no good place for. Examples:
- EntityFactory. Like several other classes in Wikibase Lib it does not belong in either Repo or Client, and is not part of a cohesive whole. We want to get rid of Lib, though have no good place to either move this class to or to put a replacement.
- ReferencedEntitiesFinder (lib)
- EntitySearchTextGenerator (repo)
- Often when refactoring away from problematic code, we need new services.
We have such code in Lib, Client and Repo. This creates problems for the development of Wikibase itself, and for reuse in tools.
For lack of a better place, some was also put in DataModel. A few examples of such services in DataModel:
- PropertyDataTypeLookup (introduced because it was needed in QueryEngine, with an implementation that had to be in Wikibase.git)
- ItemLookup (introduced by Tpt)
- EntityDiffer
These are not part of the "data model", and are generally also not used at all by the data model implementation.
Suggested solution
We can create a new component to hold these kind of services. A good name for this has not yet been agreed upon. Suggestions so far: Wikibase Services, Wikibase DataModel Services, Wikibase DataModel Toolkit.
How would this be different from Wikibase Lib?
We want to get rid of Lib, so what would make this new component any better?
Wikibase Libs suffers from problems in (at least) two different categories:
- Technical problems such as defining global state, lack of proper boundaries and depending on things in Client or Repo
- Lack of a clear responsibilities and contracts. When Lib was created, it's implicit contract was something like "everything that is used by repo and client, might be used by both of them at some point, or might be usable by something else".
The later point is one of the main causes of Lib becoming an uncohesive mess, which after years of agreement that it should die, is still there. This new component would clearly define what should go in there and what not, and would additionally also make this more explicit for Wikibase DataModel.
Classes and interfaces need to satisfy these rules to be in the component:
(We can of course make pragmatic exceptions when we want.)
- They need to be services that do something with the DataModel. If an interface does not have DataModel types in it, then it should go elsewhere. "infrastructure code" does thus not quality. Example of "infrastructure code": the "Reporting" classes and interfaces in Wikibase Lib
- They do not belong to a more specific component. For instance, serializers should go into the serialization component. This also applies to cohesive groups of behaviour for which there is no component yet, such as entity storage.
- They do not have dependencies on services (ie database) or additional components. This is to not pull in tons of dependencies for everyone using the new component. One exception to this rule is already clear: the entity diffing and patching code. This depends on Diff. A dedicated component could be created for it, though that seems to not justify itself. (And if this code is moved out of DataModel, we decrease binding by no longer having DataModel depend on Diff.)
What does creating such a component entail?
Once the empty foundation for the component is created, we can add it as a dependency for Wikibase.git. After this, we can move out those classes from Lib, Repo and Client that satisfy the conditions. This can be done incrementally, or in big chunks, depending on what we prefer. At this stage we already are able to put new such services in a nice location.
Moving out the things that are already in DataModel is more tricky. This will be a big breaking change that will affect multiple users of the component. We already have a big compatibility break in DataModel 3.x, so it might be confusing to bundle these. Hence waiting with such a change till DataModel 4.x is probably better.