add monolingual code "und-latn"
Open, Stalled, Needs TriagePublic
Actions

Assigned To

None

Authored By

	Lea_Lacroix_WMDE
	Nov 10 2020, 12:13 PM

Description

Per request from the community, we should consider adding the monolingual code "und-latn".

Request: https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team#add_monolingual_code_%22und-latn%22
Community discussion and usecases: https://www.wikidata.org/wiki/Property_talk:P969#%22und%22_or_%22und-latn%22

Event Timeline

Lea_Lacroix_WMDE created this task.Nov 10 2020, 12:13 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 10 2020, 12:13 PM

@Amire80 @jhsoby Before we move forward, could you have a look and let us know it it looks relevant from your side? Thanks!

Lea_Lacroix_WMDE updated the task description. (Show Details)Nov 10 2020, 12:18 PM

Mohammed_Sadat_WMDE updated the task description. (Show Details)Nov 10 2020, 12:21 PM

Mbch331 moved this task from Backlog to Wikidata (monolingual text) on the Language codes board.Nov 10 2020, 8:01 PM

It's been two weeks and we didn't hear any veto from @Amire80 or @jhsoby, so I think we can move forward with it. @Mbch331 will you prepare a patch that we can merge?

Oh, I missed it, sorry! It's not so usual. Give me a day to check it. If you don't hear more in 24 hours, go ahead.

I think we can go for it :)

Actually, please, no.

At least not without better examples.

None of the examples in the community discussion justify a different code. All the examples are just not-so-correct values of a property that is marked as "deprecated". Are there any other examples?

Waiting for the result of discussions here and positive feedback from Amir or another LangCom person.

I think stuff was added to "und" for now.

In T267636#7129188, @Esc3300 wrote:

I think stuff was added to "und" for now.

And again, are there any direct examples? Like with "mul", I don't understand what is it for.

The linked page lists several samples now archived at Wikidata. These have been converted to P6375 statements:

So, actually, not only "und" was used.

In T267636#7195695, @Esc3300 wrote:

The linked page lists several samples now archived at Wikidata. These have been converted to P6375 statements:

using "uk", using "uk",

using "und", using "und",

using "ru",

So, actually, not only "und" was used.

All of these are just wrong, and "und" is not necessary in any of them. It's supposed to be Ukrainian, Russian, Tajik, Japanese. In all these cases, a dedicated code would just perpetuate data that is sloppy and easily fixable.

In T267636#7196372, @Amire80 wrote:

In T267636#7195695, @Esc3300 wrote:

The linked page lists several samples now archived at Wikidata. These have been converted to P6375 statements:

using "uk", using "uk",

using "und", using "und",

using "ru",

So, actually, not only "und" was used.

All of these are just wrong, and "und" is not necessary in any of them. It's supposed to be Ukrainian, Russian, Tajik, Japanese. In all these cases, a dedicated code would just perpetuate data that is sloppy and easily fixable.

I think the sloppiness was caused by the lack of adequate language codes. "und-latn" would have been that and still could be (but it's now harder to apply). As @Lydia_Pintscher mentioned, it's not a life-or-death situation, but inaction and delays in the addition of the IETF language tags to Wikidata can lead to a deterioration of data quality at Wikidata.

Not sure where you want to go with "It's supposed to be Ukrainian, Russian, Tajik, Japanese":

technically it would be correct to use "ru" or "uk" for Latin script text in these languages, but I don't think this is desirable at Wikidata. AFAIK, it's generally not being used that way in Wikidata.
if you think that Wikidata shouldn't store structured data for the samples given above, that is something you should propose and discuss as a Wikidata contributor in the adequate forum (e.g. Project chat). Here we try to determine the appropriate language code for the sample texts with help of a review by langcom.

"und" is for undetermined languages. These languages are determined. This was discussed on Wikidata pages and in Phabricator, and I haven't yet seen a single example of a value in an undetermined language.

Any text is in an undetermined language until the actual code is set.

Can you list the codes you deem appropriate for the 5 samples given?

In T267636#7196480, @Esc3300 wrote:

Any text is in an undetermined language until the actual code is set.

Can you list the codes you deem appropriate for the 5 samples given?

I already said this today, and in the past on Wikidata: Ukrainian, Russian, Tajik, Japanese. These addresses can be written in the respective scripts of the respective languages. If someone wants to write them in transliteration, then it's a transliteration to a certain language, probably English. None of these are undetermined. I'm not going to repeat this yet again.

I don't think you are answering the question directly. The question is merely about the text at hand.

If you think it's "probably English" and we should be using "en", at least that's an answer to this task.

This would be useful for quoting mis- or poorly transcribed transcriptions of text, as well as the titles of works which consist of invented words.

add monolingual code "und-latn"Open, Stalled, Needs TriagePublicActions

Description

Event Timeline

add monolingual code "und-latn"
Open, Stalled, Needs TriagePublic
Actions