Maniphest T152019

Decide whether a Lexeme's lemma can have multiple representations for the same language code.
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	daniel
	Nov 30 2016, 5:34 PM

Description

We decided to have multi-variant lemmas on Lexemes, see T151582. That is, we support multiple representations (spellings, scripts) of the lemma.

This again raises two options:

allow only one representation per language (in PHP, that would be a TermList; In JSON this would be a simple object, using language codes as the keys and terms as values)
allow any number of representation per language (in PHP, that would be an AliasGroupList; In JSON this would be an object with language codes as the keys but lists of terms as the values)

The advantage of one-per-language is that it is easier to use: we can apply the same language fallback we use for Item labels, and get a single string. The disadvantage is that we may invent language codes to cover regional differences, dialects, and changes over time. We may want to use Item qids instead of ISO code to overcome this, but we have to map these to ISO codes at least for use in HTML and RDF. We could also go with a hybrid approach, ISO language codes suffixed by qids, e.g. de-au.Q131964. The suffixes could just be stripped for use in HTML and RDF, but we'd need a rather complex widget for picking and editing the language code.

Alternatively, we may allow any number of representations with the same language code. This is what the Lemon model does: it allows a set of arbitrary representations, with no restrictions on the language markers. This adds complexity for consumers that need to single value: even after finding the correct group by applying language fallback, they would have to pick one member of the group at random, or concatenate them. The advantage of this approach is that we can rely on a closed set of language codes, for which we can assume support by clients.

NOTE: This needs to be decided for the canonical representation of Lexemes before going live. The multi-value JSON representation is forward-compatible, while the single-value JSON structure is not.

Related Objects
Search...

Status	Subtype	Assigned	Task
Open	Feature	None	T13996 A way to select which parts of Wiktionary articles to show
Open	Feature	None	T14213 Following a link to a language entry in Wiktionary should display only that entry
Open	Feature	None	T13998 A way to show only those languages on Wiktionary that the user is interested in
Open	Feature	None	T38881 Wiktionary needs usable API
Open		None	T31229 Extension to provide access via the dict protocol
Open		None	T109579 [Epic] Give more sister projects access to Wikidata
Resolved		Lydia_Pintscher	T986 Use structured data on Wiktionary
Resolved		Lydia_Pintscher	T988 Phase 1: Represent Wiktionary lexicon using structured data
Resolved		Lydia_Pintscher	T146637 Wikidata 2016 Q4 goals
Resolved		Lydia_Pintscher	T150179 Wikidata 2017 Q1 goals
Resolved		Lydia_Pintscher	T146662 [Story] new entity type for Lexemes (baseline)
Resolved		Lydia_Pintscher	T148139 Implement a base version of the Lexeme Entity type
Resolved		Ladsgroup	T148827 Implement Lemma in Lexeme
Resolved		Lydia_Pintscher	T152019 Decide whether a Lexeme's lemma can have multiple representations for the same language code.

Event Timeline

daniel created this task.Nov 30 2016, 5:34 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 30 2016, 5:34 PM

daniel triaged this task as High priority.Nov 30 2016, 5:35 PM

daniel added a parent task: T148827: Implement Lemma in Lexeme.

daniel added projects: Wikidata, Wikidata Lexicographical data.

daniel added subscribers: WMDE-leszek, thiemowmde, Ladsgroup and 3 others.

daniel added a project: Wikidata-Former-Sprint-Board.Nov 30 2016, 5:37 PM

daniel mentioned this in T151582: Decide whether a Lexeme's lemma is a single Term, or a TermList (multi-variant)..Nov 30 2016, 5:44 PM

WMDE-leszek moved this task from Proposed to Backlog on the Wikidata-Former-Sprint-Board board.Dec 6 2016, 2:49 PM

daniel added a project: User-Daniel.Dec 6 2016, 4:42 PM

Yair_rand subscribed.Dec 12 2016, 8:07 PM

daniel mentioned this in T153850: Include "special" languages in language selector for monolingual text.Dec 21 2016, 12:38 PM

Is a dialect a language for the purposes of this discussion?

daniel moved this task from Inbox to Push on the User-Daniel board.Jan 5 2017, 4:11 PM

Decision: Yes in the future but for now we only allow one.

@ChristianKl: Yes. Sorry for answering only now. The previous reply didn't get sent it seems.

Decide whether a Lexeme's lemma can have multiple representations for the same language code.Closed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

Decide whether a Lexeme's lemma can have multiple representations for the same language code.
Closed, ResolvedPublic
Actions

Related Objects
Search...