Page MenuHomePhabricator

Improvements to the WikibaseLexeme Lua interface (before full rollout)
Closed, ResolvedPublic

Description

As of 2021-10-29, Lua access to Lexemes is enabled on the Beta cluster, but not in production. The Lua interface (i.e. which functions exist, what they’re called, which arguments they take and what they return, etc.), documented on doc.wikimedia.org, is mostly just a first draft – things that seemed like a good idea when they were first implemented. However, we should consider making some improvements before the full production rollout.

If you have any suggestions for improvements, or comments on things that aren’t so great in the current model, feel free to leave them here, either as comments or as subtasks.

Event Timeline

For example: form entities currently have a method form:getGrammaticalFeatures(), which is identical to form.grammaticalFeatures. Is that really something we need?

On the other hand, a function form:hasGrammaticalFeature( itemId ) could be a useful addition.

Hello, I'm not sure if I've described the use case somewhere before. But a future use case for this would be c:Module:Technique. The module translates (concepts) of materials, (artistic) techniques, colors and some similar things from a controlled English-language vocabulary of roughly 1000 terms into grammatically correct descriptions in currently 35 languages.
With Lua access to lexicographical data we could get rid of local data on Commons in templates like this one and new languages could easily added and thus internationalization of Commons improved.
For that a performant way for this would be necessary:

  1. starting with a Wikidata Q-ID
  2. find a lexeme with item for this sense (P5137): item A (e.g. basalt (Q43338)), in language B (e.g. Catalan), so here this would be basalt (L348865)
  3. get the specific from needed with given grammatical features (e.g. singular), so here this would be "basalt" (L348865-F1) and get its representation

Probably the second step will be difficult as its equivalent isn't possible for item and property data objects either. If this was possible though it could give internationalization on Commons and maybe other projects a great boost.
Sorry for the long text, feel free to adapt it and move it to any potentially more appropriate place!

Yes, I imagine step 2 would require something like T185313 or T199887. You could probably put the lexeme IDs into data pages, but I’m not sure if that still offers a meaningful benefit over the current data pages.

Yes, I imagine step 2 would require something like T185313 or T199887. You could probably put the lexeme IDs into data pages, but I’m not sure if that still offers a meaningful benefit over the current data pages.

Yes, I think putting the lexeme IDs into data pages would be a benefit. Automatic updating of the data pages via SPARQL wouldn't be too complicated and this would relieve Commons from the burden of maintaining "lexeme forms" locally. So for now Lua support for lexemes on Commons is the main missing part … :-)

I am still trying to wrap my head around how to use lexme data, but I also have use case similar to Marsupium's use case. I would like to replace tables like c:Data:I18n/MonthCases.tab where we store declensions of month names in different languages, so that we can have a function like c:Module:DateI18n's MonthCase(month, case, lang) (like 146) where, MonthCase(1, "ins", "pl") will give you L1872-F5, Polish instrumental case, singular of January. I would love to be able to point to Q108 and ask for any language, any grammatical case.

Similarly c:Module:I18n/complex_date has bunch of declensions of various words in various languages, which ideally would not be hardcoded there but would come from Lexme data. So I would second Marsupium's desire for a lua function that takes item's q-code and language code and provides lexme L-code, which than can be queried for available cases and declensions.

Thanks for the feedback folks :)
It looks like we need to push T185313 or T199887 up the priority list. I'll take not of that.

Then let’s close this task, and I’ll upload a Gerrit change to declare the Lua interface stable in its current form. Once the “Lua haswbstatements” feature has been implemented, we’ll announce it as a significant (but hopefully not breaking) change to the interface, but there’s no need to block the initial Lexeme Lua rollout on it.