Implement Lua access to Lexemes, Senses and Forms
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Lucas_Werkmeister_WMDE
	Oct 18 2019, 5:10 PM

Description

Task to collect some preliminary work on T212843: [EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites. This initial implementation will likely not feature fine-grained usage tracking yet, and parser functions are out of scope for now.

Details

Subject	Repo	Branch	Lines +/-
Declare Lexeme Lua interface stable	mediawiki/extensions/WikibaseLexeme	master	+1 -4
Track “all” usage for whole Lexeme instead of Sense/Form	mediawiki/extensions/WikibaseLexeme	master	+21 -6
Add Lua module for Senses	mediawiki/extensions/WikibaseLexeme	master	+244 -0
Change function declarations to Lua style	mediawiki/extensions/WikibaseLexeme	master	+14 -14
Add Lua module for Forms	mediawiki/extensions/WikibaseLexeme	master	+271 -8
Add mw.wikibase.lexeme.splitLexemeId function	mediawiki/extensions/WikibaseLexeme	master	+73 -0
Capitalize Lexeme more consistently	mediawiki/extensions/WikibaseLexeme	master	+27 -27
Make mw.wikibase.lexeme.entity.lexeme inherit mw.wikibase.entity	mediawiki/extensions/WikibaseLexeme	master	+24 -10
Add getLemmas function to Lua modules	mediawiki/extensions/WikibaseLexeme	master	+81 -0
Add all-usage for all subentities	mediawiki/extensions/WikibaseLexeme	master	+8 -1
Specify Lua module to be used for Lexeme entities	mediawiki/extensions/WikibaseLexeme	master	+2 -3
Add documentation for rudimentary Lua modules	mediawiki/extensions/WikibaseLexeme	master	+73 -0
Add rudimentary mw.wikibase.lexeme.entity.lexeme Lua module	mediawiki/extensions/WikibaseLexeme	master	+177 -1
Add rudimentary mw.wikibase.lexeme Lua module	mediawiki/extensions/WikibaseLexeme	master	+306 -0

Related Objects
Search...

Status	Assigned	Task
Open	None	T212843 [EPIC] Access to Wikidata's lexicographical data from Wiktionaries and other WMF sites
Resolved	Lydia_Pintscher	T235901 Implement Lua access to Lexemes, Senses and Forms
Resolved	Lucas_Werkmeister_WMDE	T294224 mw.wikibase.lexeme is nil on Beta Wikidata, and mw.wikibase.mediainfo is nil on Commons (beta+prod)
Resolved	Lucas_Werkmeister_WMDE	T294637 Improvements to the WikibaseLexeme Lua interface (before full rollout)
Resolved	Lucas_Werkmeister_WMDE	T297024 Add methods to get lemma, representation, gloss by language code
Resolved	Lucas_Werkmeister_WMDE	T297404 Remove most of mw.wikibase.lexeme module (remove getLemmas, getLanguage, getLexicalCategory; keep splitLexemeId)
Resolved	Lucas_Werkmeister_WMDE	T297478 Add form:hasGrammaticalFeature( itemId ) Lua method

Event Timeline

Lucas_Werkmeister_WMDE created this task.Oct 18 2019, 5:10 PM

Change 544205 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme Lua module

https://gerrit.wikimedia.org/r/544205

Change 544206 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme.entity.lexeme Lua module

https://gerrit.wikimedia.org/r/544206

Change 544207 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Make mw.wikibase.lexeme.entity.lexeme inherit mw.wikibase.entity

https://gerrit.wikimedia.org/r/544207

Change 544208 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Specify Lua module to be used for Lexeme entities

https://gerrit.wikimedia.org/r/544208

Change 544234 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add documentation for rudimentary Lua modules

https://gerrit.wikimedia.org/r/544234

The patches linked above add support for code of the following sort:

mw.wikibase.lexeme.getLanguage( 'L1' )
mw.wikibase.getEntity( 'L2' ):getLexicalCategory()

Missing features:

Lua modules for Senses and Forms, likewise wired up with mw.wikibase.getEntity()
getSenses() and getForms() functions/methods in the Lexeme modules, returning “instances” of the corresponding modules

Also, lots of cleanup and testing is probably still needed.

Usage tracking is also going to be interesting. Currently, it’s strictly entity-based, as far as I can see (as opposed to page-based), both on the repo (wb_changes_subscription) and on the client (wbc_entity_usage). Does this mean that a Wiktionary page for one lexeme may end up with dozens, if not hundreds of wbc_entity_usage rows, one per form (and aspect)? Or should we say that entity usage stops at subentities, and any usage of a lexeme implies usage of all of its forms? Or do we somehow group usages together, similar as for other aspects, and turn form usages into one “all forms of this lexeme” usage once they exceed a certain threshold?

Change 545377 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add all-usage for all subentities

https://gerrit.wikimedia.org/r/545377

Change 545378 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add getLemmas function to Lua modules

https://gerrit.wikimedia.org/r/545378

Change 545379 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Forms

https://gerrit.wikimedia.org/r/545379

Change 545537 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Senses

https://gerrit.wikimedia.org/r/545537

Change 544205 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme Lua module

https://gerrit.wikimedia.org/r/544205

ReleaseTaggerBot added a project: MW-1.35-notes (1.35.0-wmf.4; 2019-10-29).Oct 24 2019, 11:00 AM

Change 544206 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add rudimentary mw.wikibase.lexeme.entity.lexeme Lua module

https://gerrit.wikimedia.org/r/544206

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.8; 2019-11-26); removed MW-1.35-notes (1.35.0-wmf.4; 2019-10-29).Nov 7 2019, 12:00 PM

Change 544207 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Make mw.wikibase.lexeme.entity.lexeme inherit mw.wikibase.entity

https://gerrit.wikimedia.org/r/544207

Change 544208 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Specify Lua module to be used for Lexeme entities

https://gerrit.wikimedia.org/r/544208

Change 544234 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add documentation for rudimentary Lua modules

https://gerrit.wikimedia.org/r/544234

Change 545377 abandoned by Lucas Werkmeister (WMDE):
Add all-usage for all subentities

Reason:
not necessary after all

https://gerrit.wikimedia.org/r/545377

Change 545378 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add getLemmas function to Lua modules

https://gerrit.wikimedia.org/r/545378

Marsupium subscribed.Nov 19 2019, 1:55 PM

Lucas_Werkmeister_WMDE mentioned this in T239633: Enable mw.wikibase.getEntity() to load forms and senses.Dec 2 2019, 5:31 PM

Change 550662 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Change function declarations to Lua style

https://gerrit.wikimedia.org/r/550662

Change 554116 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Capitalize Lexeme more consistently

https://gerrit.wikimedia.org/r/554116

Change 554117 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add mw.wikibase.lexeme.splitLexemeId function

https://gerrit.wikimedia.org/r/554117

Change 554116 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Capitalize Lexeme more consistently

https://gerrit.wikimedia.org/r/554116

ReleaseTaggerBot edited projects, added MW-1.35-notes (1.35.0-wmf.11; 2019-12-17); removed MW-1.35-notes (1.35.0-wmf.8; 2019-11-26).Dec 16 2019, 2:00 PM

Change 554117 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add mw.wikibase.lexeme.splitLexemeId function

https://gerrit.wikimedia.org/r/554117

Alicia_Fagerving_WMSE subscribed.Jan 28 2020, 7:34 AM

Premeditated subscribed.May 11 2020, 2:28 PM

Infovarius awarded a token.Mar 26 2021, 9:16 PM

Infovarius subscribed.

Nikki subscribed.Apr 15 2021, 11:27 AM

Tagging as a potential PET Code Jam activity.

Addshore added a project: [DEPRECATED] wdwb-tech.Jul 29 2021, 8:51 AM

Addshore moved this task from Inbox to Product Realm on the [DEPRECATED] wdwb-tech board.

daniel added a project: Platform Engineering Code Jam-2021.Jul 29 2021, 12:26 PM

daniel moved this task from Inbox to Ideas Q1 21/22 on the Platform Engineering Code Jam-2021 board.Jul 29 2021, 12:30 PM

So9q subscribed.Sep 30 2021, 9:17 PM

One request: could we guard the code behind a per project feature flag? So we can deploy it but switch it on and off through a configuration.

It already is behind a feature flag, $wgLexemeEnableDataTransclusion (after all, the first changes were already merged).

Change 545379 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Forms

https://gerrit.wikimedia.org/r/545379

Change 545537 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Add Lua module for Senses

https://gerrit.wikimedia.org/r/545537

Change 550662 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Change function declarations to Lua style

https://gerrit.wikimedia.org/r/550662

This is now merged and will ship (behind a feature-flag) in 1.38.0-wmf.6.

ReleaseTaggerBot added a project: MW-1.38-notes (1.38.0-wmf.6; 2021-10-26).Oct 21 2021, 8:00 PM

In T235901#5587912, @Lucas_Werkmeister_WMDE wrote:

Usage tracking is also going to be interesting. Currently, it’s strictly entity-based, as far as I can see (as opposed to page-based), both on the repo (wb_changes_subscription) and on the client (wbc_entity_usage). Does this mean that a Wiktionary page for one lexeme may end up with dozens, if not hundreds of wbc_entity_usage rows, one per form (and aspect)? Or should we say that entity usage stops at subentities, and any usage of a lexeme implies usage of all of its forms? Or do we somehow group usages together, similar as for other aspects, and turn form usages into one “all forms of this lexeme” usage once they exceed a certain threshold?

The currently merged code tracks lots of ‘X’ (“all”) usages, but it still doesn’t track enough usage. Specifically, if you use mw.wikibase.getEntity( 'L1-S1' ), then the page will get a usage for L1-S1#X, but not for L1; and because we only look for pages using L1 when dispatching changes, the change won’t be notified when the lexeme is edited, and may continue to show untracked data.

I think fixing this is a hard requirement before we enable lexeme data transclusion in production. The easiest solution would be to make sure that mw.wikibase.getEntity( 'L1-S1' ) also tracks an L1#X usage, I’ll see if I can make that work.

Change 732998 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@master] Track \u201Call\u201D usage for whole Lexeme instead of Sense/Form

https://gerrit.wikimedia.org/r/732998

Lucas_Werkmeister_WMDE mentioned this in T290933: GerritBot doesn't unescape unicode characters.Oct 22 2021, 3:42 PM

Hm, there’s another thing that I forgot wasn’t done yet: the senses (and probably forms) of a returned lexeme entity aren’t entities themselves, they’re ordinary tables. Only the custom getForms() and getSenses() methods take care of properly creating entities.

mw.wikibase.getEntity('L1').senses[1]:getGlosses()
-- error: attempt to call method 'getGlosses' (a nil value).
mw.wikibase.getEntity('L1'):getSenses()[1]:getGlosses()
-- works

This isn’t as serious as the other issue – by the time getEntity('L1') returns, we’ve already tracked an “all” usage on L1, so being able to get the senses/forms without proper metatables doesn’t constitute a bypass of usage tracking or anything – but it’s still kind of strange, I guess…

Change 732998 merged by jenkins-bot:

[mediawiki/extensions/WikibaseLexeme@master] Track \u201Call\u201D usage for whole Lexeme instead of Sense/Form

https://gerrit.wikimedia.org/r/732998

The senses and forms of a returned lexeme entity aren’t entities themselves, they’re ordinary tables. Only the custom getForms() and getSenses() methods take care of properly creating entities.

I think we can leave this open for feedback after the initial Beta rollout. Should getForms() and getSenses() exist at all? Or should .forms and .senses contain entity objects already? And in either case, should they be indexed numerically (1, 2, …) or by ID (L1-F1, L1-F2, … – or just F1, F2, …?)? Maybe the initial testers have some feedback on this.

Lucas_Werkmeister_WMDE closed subtask T294224: mw.wikibase.lexeme is nil on Beta Wikidata, and mw.wikibase.mediainfo is nil on Commons (beta+prod) as Resolved.Nov 11 2021, 1:19 PM

Lucas_Werkmeister_WMDE mentioned this in T188730: [C-DIS][SW] Enable statement usage tracking on Commons and Co.Nov 24 2021, 12:43 PM

In T235901#7465832, @Lucas_Werkmeister_WMDE wrote:

The senses and forms of a returned lexeme entity aren’t entities themselves, they’re ordinary tables. Only the custom getForms() and getSenses() methods take care of properly creating entities.

I think we can leave this open for feedback after the initial Beta rollout. Should getForms() and getSenses() exist at all? Or should .forms and .senses contain entity objects already? And in either case, should they be indexed numerically (1, 2, …) or by ID (L1-F1, L1-F2, … – or just F1, F2, …?)? Maybe the initial testers have some feedback on this.

I think we can leave .forms and .senses as they are at the moment – not documented as part of the stable interface, but not particularly hidden either. Similar to the .claims on all entities (I suppose they’re .statements on MediaInfo?), where we expect users to use :getAllStatements() and other functions instead.

Lucas_Werkmeister_WMDE closed subtask T294637: Improvements to the WikibaseLexeme Lua interface (before full rollout) as Resolved.Jun 15 2022, 10:47 AM

Change 805771 had a related patch set uploaded (by Lucas Werkmeister (WMDE); author: Lucas Werkmeister (WMDE)):

[mediawiki/extensions/WikibaseLexeme@master] Declare Lexeme Lua interface stable

https://gerrit.wikimedia.org/r/805771

Change 805771 merged by jenkins-bot: