Task to collect some preliminary work on T212843: [EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites. This initial implementation will likely not feature fine-grained usage tracking yet, and parser functions are out of scope for now.
|Open||None||T212843 [EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites|
|Open||None||T235901 Implement Lua access to Lexemes, Senses and Forms|
The patches linked above add support for code of the following sort:
mw.wikibase.lexeme.getLanguage( 'L1' ) mw.wikibase.getEntity( 'L2' ):getLexicalCategory()
- Lua modules for Senses and Forms, likewise wired up with mw.wikibase.getEntity()
- getSenses() and getForms() functions/methods in the Lexeme modules, returning “instances” of the corresponding modules
Also, lots of cleanup and testing is probably still needed.
Usage tracking is also going to be interesting. Currently, it’s strictly entity-based, as far as I can see (as opposed to page-based), both on the repo (wb_changes_subscription) and on the client (wbc_entity_usage). Does this mean that a Wiktionary page for one lexeme may end up with dozens, if not hundreds of wbc_entity_usage rows, one per form (and aspect)? Or should we say that entity usage stops at subentities, and any usage of a lexeme implies usage of all of its forms? Or do we somehow group usages together, similar as for other aspects, and turn form usages into one “all forms of this lexeme” usage once they exceed a certain threshold?