Page MenuHomePhabricator

Unify separator between language and lexical category
Closed, ResolvedPublic


We are not consistent in how we separate language and lexical category.

On tooltip of Lexeme there is comma separator between language and lexical category:

comma1.png (770×1 px, 54 KB)

On search results there is no separator between language and lexical category:

comma2.png (849×1 px, 97 KB)

And no separator in the entity selector:

image.png (168×927 px, 7 KB)

Acceptance criteria:

  • We consistently separate language and lexicographical category with a localizable comma in all 3 places.

Event Timeline

Would that go through a function of some sort that renders the Word/Language combination? Or would this be defined in CSS what the sepreator is?
In any case, I have few preferences except that consistency would be great.

Editors: preferences please :) Then we can pick this up.

I would prefer comma separator but I work only with Latin scripts so I don't know if it works for all languages (Chinese, Hebrew, Tamil, Japanese etc.)

@Lydia_Pintscher , thanks a lot for asking! :)

The first example with the comma certainly looks better than the other ones with no separator.

As far as I can see, and as @Bugreporter has already written above, the message with the comma is implemented using the optional message wikibaselexeme-presentation-lexeme-secondary-label. It's default value is, expectedly $1, $2. It's a good default. I could imagine some other clever and more generic schemes, for example to use a | as a separator, but it's not really needed, and a comma is good enough. If anybody thinks that a comma is not good enough, I'll be very interested in seeing an example. Languages where something other than a comma and a space is needed can easily customize it by editing the translation at translatewiki.

The presentation without the comma is implemented using the message wikibaselexeme-description. It's value is $1 $2, and it's marked as "ignored" in translatewiki, which means that it's a message for internal technical use and cannot be translated. This designation is probably incorrect.

My immediate intuition is to do the following:

  • To use wikibaselexeme-presentation-lexeme-secondary-label consistently in every place where showing the lexeme and the part of speech is needed.
  • To examine the usage of wikibaselexeme-description. Perhaps it can be completely removed and replaced with wikibaselexeme-presentation-lexeme-secondary-label. If it's needed, then perhaps it can be changed to $1, $2, but there should be proper justification for having an identical message. If this is done, then this message should be defined as optional and not as ignored in the translatewiki configuration repo.

Having identical messages is not necessarily bad, as MediaWiki's Localisation guidelines say. I can think of at least one good justification for having two messages: one can be presented as plain text, which would be good for tooltips, and another one can be parsed with wiki syntax, for showing in context where HTML is available. (The messages don't have markup at the moment, but some languages may want it, for example for fixing RTL issues.) There can be other justifications. But this is really a questions that people who are well-familiar with the code should answer.

I'll be happy to give more L10n advice if needed. I'd be happy to go deeper into lexicographical and dictionary design advice, but I don't think that it's needed here, at least for now.

Actually there is already a separator: a spacebar.

In the ideal situation, the separator should be localizable. For example Chinese and Japanese might prefer 、(or · ) rather than a Latin comma. But this is a very technical usage (i.e. not dictated by natural language grammar) so I can't speak for the preferences of other users.

In the short term, both comma and spacebar would be fine. I think speakers of non-Latin languages can cope with a Latin comma as separator, much like we have put up permanently with the utterly foreign and non-localizable Latin colon for namespaces.

It's already localizable, as my comment above says. But I'm not sure why are there two messages and not one.

In English the space separator can work well if you treat it as a phrase with an adjective (language name) that describes a part of speech, e.g., "Polish noun". But it won't work in many other languages. For example, in Russian, names of parts of speech have gender, and then the name of the language adjective will have to be in the same gender. While it's not impossible to generate correct phrases of this kind, it's not trivial either, and probably cannot be done with just simple messages. So it's probably better not to assume that it's a phrase. "Noun, Polish", looks more generic and probably more easily localizable.

Alright. Then let's go with a localizable comma. Thanks for the input everyone!

Change 487346 had a related patch set uploaded (by Greta WMDE; owner: Greta Doçi):
[mediawiki/extensions/WikibaseLexeme@master] One comma was added to wikibaselexeme-description to separate language and lexical category

Change 487346 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] One comma was added to wikibaselexeme-description and respective test files, to separate language and lexical category.