How feasible would it be to expose various traits individually (such as denotable and addressable), to companion extensions to Wikibase (i.e. EntitySchema)?
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Resolved | ItamarWMDE | T312786 [DISCOVERY] Explore various approaches to further expand EntitySchema features | |||
Resolved | Michael | T312837 [INVESTIGATION] Explore feasibility of exposing Wikibase traits indivdually to companion extensions |
Event Timeline
[WIP] List of ways extensions could integrate with Wikibase functionality in principle
- make use of hooks
- WikibaseRepo does not seem to provide hooks with the new Hook system
- it seems to have only a single HookRunner class at all, which runs a core hook: EditFilterMergedContent
- (not fully clear to me why Wikibase manually runs that core hook)
- it does however provide a bunch of hooks the legacy hook way: https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_hooks_php.html
- it seems to have only a single HookRunner class at all, which runs a core hook: EditFilterMergedContent
- WikibaseClient does provide some hooks of its own (see document linked above)
- Three are also some js hooks provided by Wikibase: https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_hooks_js.html
- WikibaseRepo does not seem to provide hooks with the new Hook system
- hooks are also used to provide the callbacks for entitytypes and datatypes
- providing the hooks for datatypes might (more or less) cover the addressable requirement?
- see https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_datatypes.html
- neither Lexeme nor https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/523980/5/repo/WikibaseRepo.datatypes.php seem to be listing all the keys mentioned in the docs above !?
- still unclear how/when a dump into rdf happens
- see https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_datatypes.html
- providing the hooks for datatypes might (more or less) cover the addressable requirement?
- reuse services
- The service WikibaseRepo.LanguageFallbackChainFactory is rather independent from the concept of Entities and could be used by extensions for their own needs.
TODO:
- provide examples, more details, and docs for each
Overall, addressable seems to be rather straight-forward by implementing the datatypes callbacks and hooking into the Wikibase*DataTypes hooks. This would still leave queryable open, but I will have to spend dedicated time on that, because I have so far no clue about how triple storage and RDF-dumps work.
The real tricky part seems to be what we call denotable, in good parts for performance reasons. Wikibase has a lot of logic dedicated to prefetching and caching of Terms. This logic is (sometimes?) integrated into the implementation of an entity, for example, see EntityContent::getTextForSummary().
[WIP]
"user abilities" related to denotable
- shows up with a label (not just ID) in Watchlists etc.
- seems to be implemented in Wikibase by handling the HtmlPageLinkRendererEndHook
- Terms show up by falling back along a pre-defined chain
- soon this chain will include mul
- This is partly done by the WikibaseRepo.LanguageFallbackChainFactory service which is pretty independent from other Wikibase functionality
- labels show up when entities are used with Lua (TODO: figure out if that uses essentially the same hook)
- ...
"user abilities" related to editable
- Can be edited reasonably well on mobile
- implemented via the Termbox v2, which gets its data via Special/EntityData/Q123.json (I think?) and some SSR
There are some user abilities that seem to touch multiple of the aspects that we identified.
For example, T304070: API Endpoint to search for Schemas is related to:
- denotable, because they expect to search for and receive Terms
- searchable, because this is about searching on a wiki UI or via an API integration
- queryable, because in practice we might have to have this data in some kind of dump that we can then feed into elastic search?
Inconsistencies around denotable:
- EntityIdSearchHelper, created in context of \Wikibase\Lib\EntityTypeDefinitions::ENTITY_SEARCH_CALLBACK depends internally on Terms even though the concept of Entity itself does not.
- same with HistoryEntityAction
Considerations around queryable:
Currently, the rdf-dump with EntitySchema enabled as a Statement Value looks something like this:
:Q1 a wikibase:Item ; t:P1 "E10" ; t:P2 :L1 ; p:P1 s:Q1-ced30776-48cf-2c07-57dd-ac279dbfb5ac . s:Q1-ced30776-48cf-2c07-57dd-ac279dbfb5ac a wikibase:Statement, wikibase:BestRank ; wikibase:rank wikibase:NormalRank ; ps:P1 "E10" .
Specifically, the EntitySchema-Id being a literal string in t:P1 "E10" ; and ps:P1 "E10" . seems to be a problem, based on T214884#5394245.
One step of fixing this would to change the EntitySchema value type in WikibaseLib.datatypes.php#L26 from string to something else. We may not want to change it wikibase-entity, so we have to come up with something new? On the other hand, commonsMedia, which is also related to a Wikibase entity, does have the value-type string`, see WikibaseLib.datatypes.php#L19.
Summary of this investigation
The traits addressable and queryable are already exposed in a compact by registering a new DataType and ValueType in the EntitySchema extension.
See for reference WikibaseRepo.datatypes.php and WikibaseLexeme.datatypes.php. There is also documentation for this functionality.
However, the denotable aspect is much more complex and integrated into Wikibase. Also, there is no other Entity yet besides Items and Properties that has Terms. So the next step should probably be to try to answer the more specific question:
Can we “extract”/disentangle concepts like “denotable” without having to come up with new API endpoints?