Page MenuHomePhabricator

[INVESTIGATION] Explore feasibility of exposing Wikibase traits indivdually to companion extensions
Closed, ResolvedPublic

Description

How feasible would it be to expose various traits individually (such as denotable and addressable), to companion extensions to Wikibase (i.e. EntitySchema)?

Event Timeline

[WIP] List of ways extensions could integrate with Wikibase functionality in principle

TODO:

  • provide examples, more details, and docs for each

Overall, addressable seems to be rather straight-forward by implementing the datatypes callbacks and hooking into the Wikibase*DataTypes hooks. This would still leave queryable open, but I will have to spend dedicated time on that, because I have so far no clue about how triple storage and RDF-dumps work.

The real tricky part seems to be what we call denotable, in good parts for performance reasons. Wikibase has a lot of logic dedicated to prefetching and caching of Terms. This logic is (sometimes?) integrated into the implementation of an entity, for example, see EntityContent::getTextForSummary().

[WIP]
"user abilities" related to denotable

  • shows up with a label (not just ID) in Watchlists etc.
    • seems to be implemented in Wikibase by handling the HtmlPageLinkRendererEndHook
  • Terms show up by falling back along a pre-defined chain
    • soon this chain will include mul
    • This is partly done by the WikibaseRepo.LanguageFallbackChainFactory service which is pretty independent from other Wikibase functionality
  • labels show up when entities are used with Lua (TODO: figure out if that uses essentially the same hook)
  • ...

"user abilities" related to editable

  • Can be edited reasonably well on mobile
    • implemented via the Termbox v2, which gets its data via Special/EntityData/Q123.json (I think?) and some SSR

There are some user abilities that seem to touch multiple of the aspects that we identified.

For example, T304070: API Endpoint to search for Schemas is related to:

  • denotable, because they expect to search for and receive Terms
  • searchable, because this is about searching on a wiki UI or via an API integration
  • queryable, because in practice we might have to have this data in some kind of dump that we can then feed into elastic search?

Inconsistencies around denotable:

  • EntityIdSearchHelper, created in context of \Wikibase\Lib\EntityTypeDefinitions::ENTITY_SEARCH_CALLBACK depends internally on Terms even though the concept of Entity itself does not.
    • same with HistoryEntityAction

Considerations around queryable:

Currently, the rdf-dump with EntitySchema enabled as a Statement Value looks something like this:

:Q1 a wikibase:Item ;
    t:P1 "E10" ;
    t:P2 :L1 ;
    p:P1 s:Q1-ced30776-48cf-2c07-57dd-ac279dbfb5ac .

s:Q1-ced30776-48cf-2c07-57dd-ac279dbfb5ac a wikibase:Statement,
        wikibase:BestRank ;
    wikibase:rank wikibase:NormalRank ;
    ps:P1 "E10" .

Specifically, the EntitySchema-Id being a literal string in t:P1 "E10" ; and ps:P1 "E10" . seems to be a problem, based on T214884#5394245.

One step of fixing this would to change the EntitySchema value type in WikibaseLib.datatypes.php#L26 from string to something else. We may not want to change it wikibase-entity, so we have to come up with something new? On the other hand, commonsMedia, which is also related to a Wikibase entity, does have the value-type string`, see WikibaseLib.datatypes.php#L19.

Summary of this investigation

The traits addressable and queryable are already exposed in a compact by registering a new DataType and ValueType in the EntitySchema extension.
See for reference WikibaseRepo.datatypes.php and WikibaseLexeme.datatypes.php. There is also documentation for this functionality.

However, the denotable aspect is much more complex and integrated into Wikibase. Also, there is no other Entity yet besides Items and Properties that has Terms. So the next step should probably be to try to answer the more specific question:

Can we “extract”/disentangle concepts like “denotable” without having to come up with new API endpoints?