Page MenuHomePhabricator

Remove the Wikidata support
Open, Needs TriagePublic

Description

The Wikidata feature in Phonos (T319270) is built upon a distorted idea of what the IPA is, that there is such a thing as "the IPA for German", "the IPA for Hindi", etc., when in fact the IPA is simply a tool, a set of shorthands, for linguists to efficiently communicate whatever it is that they want to convey in the given context. As the Handbook of the IPA (1999: 30) puts it:

There can be many systems of phonemic transcription for the same variety of a language, all of which conform fully to the principles of the IPA. ... In English, for example, the contrast between the words bead and bid has phonetic correlates in both vowel quality and vowel duration. A phonemic representation which explicitly notes this might use the symbols /iː/ and /ɪ/ ... But it is equally possible unambiguously to represent these phonemes as /iː/ and /i/ ..., or as /i/ and /ɪ/ ... All three pairs of symbols are in accord with the principles of the IPA ... The IPA does not provide a phonological analysis for a particular language, let alone a single 'correct' transcription, but rather the resources to express any analysis so that it is widely understood.

Notice /i/ represents the vowel in bid in the second pair, but the one in bead in the third. This means you can't know whether the transcription bid is supposed to be pronounced like bead or bid just by looking at it—you have to know the conventions underlying it.

So the idea that you can extract a transcription from Wikidata and expect a TTS to render it the way the transcriber intended is fundamentally bankrupt. If users still want it, they can do it by putting {{#statements:P898}} inside {{#tag:phonos}} themselves. But it shouldn't be in the Phonos code. It is not even clear what the use cases are or what problem it is trying to solve to begin with.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I don't see how this relates to Wikidata. Wikidata does not use IPA differently from Wikipedia or Wiktionary.

@Nikki This is about Phonos, an extension under development.

Nardog updated the task description. (Show Details)

If users still want it, they can do it by putting {{#statements:P898}} inside {{#tag:phonos}} themselves.

I think this is the crux of why Phonos should handle fetching IPA from Wikidata. Because it's not as simple as fetching the best statement and showing that — it's necessary to loop through the IPA transcription claims and select the appropriate one based on the passed lang="" attribute.

That can of course be done on a wiki in a Lua module, but then it'd have to be repeated on every wiki that wants to do it. If Phonos does it, then everyone gets the functionality without any extra work.

The main reason for wanting to retrieve IPA transcriptions from Wikidata (as I understand it) to make it easier for wikis without local IPA skills to provide pronunciations. For example, many Wikipedias want to provide IPA in both the wiki's language, and the appropriate language of the subject of the article (e.g. Prague: English prɑːɡ; Czech: ˈpraɦa), and it's easier if they can source the latter from Wikidata instead of copying and pasting it from wherever.

The context for the IPA is the same in both places, I think. In both places we're asking e.g. "for an average Spanish speaker, what is the pronunciation of the English word xylophone?" in much the same way that our human-recorded audio pronunciation files are catalogued as being in a particular language. I don't understand why the approach on Wikidata is different to that on Wikipedia.

That said, I do see the argument for not adding this sort of logic to an extension, because it can be done quite satisfactorily in Lua. Maybe that's enough of a reason to remove it from Phonos. If that's the reasoning, then I think I'm weakly in favour; if it's because it's "fundamentally bankrupt" to store IPA in Wikidata, then I don't agree.

My point isn't that Phonos shouldn't provide the feature because it can be achieved by other means. The point is that no one should seek to achieve it because it doesn't make sense, for the reasons I've thoroughly explained. To repeat, 1) a transcription by itself is meaningless without reference to a key and 2) there's no guarantee a transcription on Wikidata is compatible with Phonos's TTS.

I'm not saying it's a bad idea to document pronunciation on Wikidata per se. The current situation on the site indeed leaves much to be desired, but even if it was improved to the point where you could rely on it, you wouldn't want to use a transcription on Wikidata as-is; you'd likely want to convert it so it conforms to the conventions on your site or to the TTS. So while there might be a demand for extracting transcriptions from Wikidata, a tool that only allows putting them into a TTS verbatim is useless.

After T320523, the feature has been barely functional as Wikidata and TTS rarely share the same language codes. You could of course make the tag accept separate codes for Wikidata and TTS, but again, at that point you'd also want to manipulate the string you got from Wikidata, so there's hardly a point in hard-coding extraction in Phonos itself.