Create a dispatching PropertyInfoLookup
Closed, ResolvedPublic

Description

Implement a per-repo dispatching version of PropertyInfoLookup, similar to DispatchingTermBuffer or DispatchingEntityRevisionLookup.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptDec 1 2016, 3:55 PM

Change 324739 had a related patch set uploaded (by Jakob):
Add DispatchingPropertyInfoLookup

https://gerrit.wikimedia.org/r/324739

One concern that I had while working on this was that the keys of the returned arrays of getAllPropertyInfo and getPropertyInfoForDataType of the previous implementations of PropertyInfoStore are numeric IDs of properties. Is that something that would need to be addressed (right away)?

WMDE-leszek triaged this task as High priority.Dec 5 2016, 8:51 AM

@Jakob_WMDE sorry, somehow I've managed to miss your comment.

One concern that I had while working on this was that the keys of the returned arrays of getAllPropertyInfo and getPropertyInfoForDataType of the previous implementations of PropertyInfoStore are numeric IDs of properties. Is that something that would need to be addressed (right away)?

Good point, I haven't realized that while first looking at your patch. I don't know if this should be done right away (as in in the patch adding DispatchingPropertyInfoLookup) but this should be probably be done rather soon. As I get, what the dispatching would currently do would be overriding properties from one repo if the property from the other repo happens to have the same number part of id. This is quite bad.

One option could be to index those result arrays with id serialization (including repo prefix) instead of just using numeric keys. I had a brief look on usage of getAllPropertyInfo and getPropertyInfoForDataType and if I am not mistaken it such change should be not that complicated. I am not sure how sorting should be applied (in Special:ListProperties) but this is definitely something we could figure out.

@Jakob_WMDE Good catch!

I agree with @WMDE-leszek that we should switch to full serialized IDs instead of numeric IDs for the array index. That should not be done in the same patch, but perhaps it should be done before the patch introducing the dispatching implementation. Doing it after is still OK because we will not have properties from multiple repos at once any time soon, btu we should still address this quickly.

As Leszek said, it doesn't look like it would be hard to fix.

A word about Special:ListProperties, though: I'm not sure whether it should list all available properties, or just the ones from the local repo. Or maybe there should be an option for selecting the repo (or all repos)? I suppose we need to think about the use cases for that special page again.

If we show properties from multiple different repos on Special:ListProperties, it would be nice if we could sort by repo first, and then by numeric ID second. Sorting by the serialized ID would be bad - if we can't sort by numeric ID, we should sort by name.

Btw, note that Special:ListProperties is the only actual use of getPropertiesByType. We could remove that from the service interface, and just filter directly in the special page implementation. Even with properties from multiple repos, we can (and do) still assume that we will have no more than a few thousand properties, and can easily handle them all at once in memory.

Another thing to consider is the caching layer. Do we have a caching PropertyInfoLookup for each repo, or only one cache, on top of the dispatching implementation? I'm tending towards having one cache per repo...

Change 324739 merged by jenkins-bot:
Add DispatchingPropertyInfoLookup

https://gerrit.wikimedia.org/r/324739

WMDE-leszek closed this task as Resolved.Jan 24 2017, 9:01 AM