Implement a per-repo dispatching version of PropertyInfoLookup, similar to DispatchingTermBuffer or DispatchingEntityRevisionLookup.
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
Add DispatchingPropertyInfoLookup | mediawiki/extensions/Wikibase | master | +215 -0 |
Event Timeline
Change 324739 had a related patch set uploaded (by Jakob):
Add DispatchingPropertyInfoLookup
One concern that I had while working on this was that the keys of the returned arrays of getAllPropertyInfo and getPropertyInfoForDataType of the previous implementations of PropertyInfoStore are numeric IDs of properties. Is that something that would need to be addressed (right away)?
@Jakob_WMDE sorry, somehow I've managed to miss your comment.
One concern that I had while working on this was that the keys of the returned arrays of getAllPropertyInfo and getPropertyInfoForDataType of the previous implementations of PropertyInfoStore are numeric IDs of properties. Is that something that would need to be addressed (right away)?
Good point, I haven't realized that while first looking at your patch. I don't know if this should be done right away (as in in the patch adding DispatchingPropertyInfoLookup) but this should be probably be done rather soon. As I get, what the dispatching would currently do would be overriding properties from one repo if the property from the other repo happens to have the same number part of id. This is quite bad.
One option could be to index those result arrays with id serialization (including repo prefix) instead of just using numeric keys. I had a brief look on usage of getAllPropertyInfo and getPropertyInfoForDataType and if I am not mistaken it such change should be not that complicated. I am not sure how sorting should be applied (in Special:ListProperties) but this is definitely something we could figure out.
@Jakob_WMDE Good catch!
I agree with @WMDE-leszek that we should switch to full serialized IDs instead of numeric IDs for the array index. That should not be done in the same patch, but perhaps it should be done before the patch introducing the dispatching implementation. Doing it after is still OK because we will not have properties from multiple repos at once any time soon, btu we should still address this quickly.
As Leszek said, it doesn't look like it would be hard to fix.
A word about Special:ListProperties, though: I'm not sure whether it should list all available properties, or just the ones from the local repo. Or maybe there should be an option for selecting the repo (or all repos)? I suppose we need to think about the use cases for that special page again.
If we show properties from multiple different repos on Special:ListProperties, it would be nice if we could sort by repo first, and then by numeric ID second. Sorting by the serialized ID would be bad - if we can't sort by numeric ID, we should sort by name.
Btw, note that Special:ListProperties is the only actual use of getPropertiesByType. We could remove that from the service interface, and just filter directly in the special page implementation. Even with properties from multiple repos, we can (and do) still assume that we will have no more than a few thousand properties, and can easily handle them all at once in memory.
Another thing to consider is the caching layer. Do we have a caching PropertyInfoLookup for each repo, or only one cache, on top of the dispatching implementation? I'm tending towards having one cache per repo...