User Details
- User Since
- May 4 2022, 1:11 PM (73 w, 1 d)
- Availability
- Available
- LDAP User
- Ariel Gutman
- MediaWiki User
- Ariel Gutman [ Global Accounts ]
Dec 21 2022
Nov 17 2022
There is a discussion of my stop-gap solution on the item where I added the literal translation property: https://www.wikidata.org/wiki/Talk:Q467#Lexemes
Oct 25 2022
As a stop gap solution, I'm suggesting we use the literal translation property to link items to senses. As an example of its usage, I've linked Q467 to Hebrew L63925. This seems to work well for cases where the item corresponds to a single lexeme (sense).
Oct 13 2022
- Yes, you're right it makes more sense to link to a sense.
Oct 11 2022
As said, both issues can be solved. The issue is that, as currently construed, the labels/descriptions are not really machine-readable: currently they are usable mostly for human consumption.
Do you mean that will clutter the UI or the database itself? If the former, this can be solved by selectively showing these link in the UI. If you refer to cluttering the database itself - I agree this would require extra capacity, but I don't think it is unmanageable.
Oct 7 2022
Oct 4 2022
Ok, who is responsible for this approval? Could we ping them?
Oct 3 2022
Can we go forward with nd & nr codes?
Sep 7 2022
I'm taking care of the Ndebele language codes (nd & nr) in https://gerrit.wikimedia.org/r/828887.
A prototype has now been created in https://meta.wikimedia.org/wiki/Module:Sandbox/AbstractWikipedia.
Sep 1 2022
I would like to support here the idea to add all the language codes of ISO 639-3 to be supported by Wikidata (and Abstract Wikipedia). Notwithstanding @mrephabricator's comments, this standard is the de-facto used standard to enumerate all the world's languages, and the Ethnologue, on which the stanard is based, is generally accepted as a scientifically solid resource (even though it may contain some errors). The ideological background of SIL International is in my opinion irrelevant, but I must note that the claim that they have no linguistic background is completely false. In fact, this organization has conducted extensive linguistic fieldwork in numerous parts of the world, and many of its members are trained linguists, the most famous one being Kenneth Pike.
Aug 10 2022
I agree with @ori it's worth the while to attempt a Lua prototype of this.
This raises however some design questions:
- Where would the NLG templates be stored? Would they exist as special pages within Wikipedia (as Wikitext templates do, AFAIU)?
- Would the NLG templates be compiled into Lua code at authoring time, or will they be interpreted by a Lua parser on the go? This affects the question of how calls to sub-templates should be handled - as normal function calls or as templates which need special parsing.
- In general, how one would go ahead and execute the functions embedded within template slots? The most straightforward possibility is to use Lua's loadstring function, however, this is currently disabled in Scribunto. Also, this would allow running any arbitrary Lua code in a template slot, which is arguably too much. Another option is to parse the functional expressions in the slots and call the functions through the environment variable _G.
Jul 25 2022
@Asaf Insofar two forms are considered distinct lexemes, it is probably the case that not all statements hold for both forms (e.g. the pronunciation may be different, and possibly other details such as etymology). If the two forms are close enough (e.g. just minor dialectal pronunciation details), then we may indeed lump them together in one lexeme as if there were spelling variants (and then my suggested patch may become relevant). Even if we decide to split them, we may of course link the two lexemes to each other, using various properties such as "synonym of" or "derived from" etc. Anyhow, my suggested patch would allow more easily to lump together such variants, as it allows re-using the same basic language code for several spelling variants.
Jul 22 2022
@LucasWerkmeister I agree with you that if two variants have two different pronunciation, they should probably be split into two different lexemes (in general, I think we should avoid having multiple forms with the same grammatical features within one lexeme). There is some leeway, however, in this rule, since different dialects may have slightly different pronunciations which we still want to group into a single lexeme/form. For instance American English "color" and British English "colour" are in fact pronounced slightly differently, but it would be over-kill to split them, since the difference in pronunciation is systematic between the dialects.
Jul 12 2022
I believe the current situation, where multiple forms are added to account for spelling variations goes against the spirit of the lexicographical data model, and in particular the idea that there should be exactly one form for each combination of grammatical features. Therefore I think it is important to unblock this situation, and I think my proposal is a simple way to go forward.
Jun 30 2022
I've now created a patch that does allow associating several spelling variants with the same private language code.
Jun 24 2022
@Fnielsen as far as I see, each variant spelling forms its own set of inflected forms, so you have a paradigm related to mørklægge and another paradigm related to the variant spelling mørkelægge. So conceptually you don't have a single list of forms, but rather two distinct lists of forms. For this reason (and since the pronunciation slightly differs) it may make sense to separate them to two distinct lexemes.
I'm working on a patch to allow multiple forms associated with the same private language code.
@Fnielsen given that the pronunciation of these forms is in fact different (according to the X-Sampa notation), and each has its own distinct inflection set, I would treat these as two distinct (synonymous) lexemes. I don't see the advantage of lumping all these forms in one entry. Of course, in a dictionary intended for human-consumption it is convenient to list them together, but in a machine-readable dictionary, such as Wikidata, these should really be treated as two distinct lexemes.
@mxn If these are purely orthographic variants (i.e. the pronunciation is the same) I would list them under a single lexeme. And in that case, the most natural way would be to list them as spelling variants rather than distinct forms.
Jun 21 2022
The ideal solution would be to allow (in the language code validator) arbitrary language codes including a rank identifier. For instance, for Viatnamese one should be able to use codes such as vi-x-Q8201-1, vi-x-Q8201-2 etc. Currently this doesn't pass the validation as one gets the error Invalid Item ID "Q8201-1".
May 13 2022
May 5 2022
@jhathaway to be honest, I'm not sure. Maybe @cmassaro would now.