IPA or SAMPA module
OpenPublic

Description

Author: xmlizer

Description:
The goal is to give to mediawiki and especially for wiktionnary a IPA/SAMPA
module which work like ISBN module in wikipedia

Just type :
IPA : [toto]
or
SAMPA : [toto]

and it will make a link (a may be an icon) to ear in midi the phonetic


Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=31221

bzimport added a subscriber: wikibugs-l.
bzimport set Reference to bz224.
bzimport created this task.Via LegacyAug 26 2004, 11:07 AM
bzimport added a comment.Via ConduitAug 26 2004, 2:13 PM

wiki wrote:

Very good idea! Of course it should read: IPA: [toto] or SAMPA: [toto], the
square brackets shouldn't be obligatory, since sometimes phonological
transcriptions may be used, so it would read: IPA: /toto/.

bzimport added a comment.Via ConduitSep 11 2004, 9:41 AM

ser_canof wrote:

I'm with you. However IMHO is better a form like <ipa>roman_transcription</ipa>.
It could be possible using the LaTeX extension named TIPA and the software made
for <math></math>. In my imagination the enhancement should make an image from
the code in <ipa></ipa>, exactly like <math>.

bzimport added a comment.Via ConduitOct 4 2004, 8:56 PM

david wrote:

I am working on this right now. It's a generalized module that can input and
output phonetic representations in a variety of formats:

Unicode IPA (UTF-8)
Unicode IPA (HTML entities)
X-SAMPA

as well as a few more obscure ones:
Kirshenbaum
tipa (the TeX IPA package)
a modified version of the system used in _Big Book of Beastly
Mispronunciations_, which gives things like <small>KAL</small>-i-FOR-nyuh

I'm still working on coding each of the modules--I currently have a Unicode IPA
(UTF-8) reader, and a writer for Unicode IPA (UTF-8), Unicode IPA (HTML
entities), X-SAMPA, and Big_Beastly. I'm working on the X-SAMPA reader
currently. I'm not sure readers will be needed for Kirshenbaum, tipa, and
Big_Beastly.

Also, I haven't yet worked out the syntax for denoting phonetic strings.
Thankfully, I've designed it so the syntax is not integral and the modules
should be compatible with any syntax.

A few other notes:

  • Generating audio pronunciations will require the installation and use of a TTS

system. Unfortunately, current Free TTS systems sound like crap. I don't think
there is much point at this juncture to invest development time in automatically
generating audio pronunciations.

  • Generating an image of the IPA _could_ be done by connecting the phonetics

module to texvc and installing the LaTeX TIPA package. I'm not going to invest
development time in this right now other than generating TIPA-compatible output.

  • The syntax should be able to specify what format the input is in and what

format(s) the output should be in. Further, there should be a reasonable default
for both of these. I would advocate Unicode IPA -> Unicode IPA as the defaults.
Further, there should be a set of standard templates for generating phonetic
outputs in various formats. We can add some IE-specific CSS to explicitly
specify the font to a set of fonts known to contain IPA symbols (this isn't
necessary with other browsers because they substitute in Unicode characters from
other fonts when the current font doesn't contain the requested character).
Finally, there should be a user preference to set the preferred output format
for phonetic data that would override anything that uses the defaults.

<phon input=xsampa>"hE.loU</phon> -> IPA output
<phon input=xsampa output=bb>"hE.loU</phon> -> HE-loh

but there would be templates for this

{{xsampa_to_ipa|"hE.loU}}
{{ipa_to_xsampa_and_ipa|toto}}

This needs more thought.

bzimport added a comment.Via ConduitOct 10 2004, 10:55 PM

david wrote:

Patch to Setup.php in support of files to be uploaded subsquently

This is a diff for Setup.php that includes "Phonetics.php" ( to be uploaded )

attachment Setup.php.diff ignored as obsolete

bzimport added a comment.Via ConduitOct 10 2004, 10:57 PM

david wrote:

Phonetics.php file to go into includes/

This is the Phonetics.php file which supports the phonetics extensions. To be
described more fully in a forthcoming comment.

attachment Phonetics.php ignored as obsolete

bzimport added a comment.Via ConduitOct 10 2004, 10:59 PM

david wrote:

archive containing files used to generate Phonetics.php

This is an archive containing the files that are used to generate
Phonetics.php, including a Makefile. To be described in a forthcoming comment.

attachment phonetics.tar.gz ignored as obsolete

bzimport added a comment.Via ConduitOct 10 2004, 11:07 PM

david wrote:

OK. I have uploaded 3 attachments that implement the IPA/SAMPA solution I have created.

Overview of what it does: it supports the following new tags: <ipa> <ipa-en> <xsampa> <xsampa-en>. The <ipa> tag takes IPA
Unicode input (either UTF-8 or numeric entities) and returns 2 <span>s: one containing the IPA Unicode in all numeric entities, and the
other containing the equivalent X-SAMPA. The <xsampa> tag takes X-SAMPA input and returns the same <span>s as <ipa>. The -en
versions of the tags are identical, except they also return a third <span> containing the phonetics in a "simple English" phonetic format.
This option is in a separate tag because this only works with English phonemes.

Overview of how it works: Phonetics.php is auto-generated from some files in the phonetics.tar.gz archive. The translation tables are
generated via a perl script from a tab-separated text file containing all the correspondances between phonetic systems. The translation
tables are then #included via cpp into the php source (Phonetics.phpi). PHP functions include() or require() won't work for this because
they can't be called from within a class definition.

bzimport added a comment.Via ConduitOct 11 2004, 6:28 AM

david wrote:

The newest versions of this depend on Parser that supports parameters in tags

bzimport added a comment.Via ConduitOct 11 2004, 8:20 AM

david wrote:

archive containing files used to generate Phonetics.php

newer version that eliminates the previous tags and now supports just the
<phon> tag which takes attributes "encoding" and "display"

attachment phonetics.tar.gz ignored as obsolete

bzimport added a comment.Via ConduitOct 11 2004, 8:22 AM

david wrote:

Revised version of Phonetics.php

New version of Phonetics.php, generated by files in attachment 88

attachment Phonetics.php ignored as obsolete

bzimport added a comment.Via ConduitApr 2 2005, 1:00 PM

xmlizer wrote:

Do the patch work correctly ?

MarkAHershberger added a comment.Via ConduitDec 2 2010, 5:57 PM

This is handled with templates now. See http://en.wiktionary.org/wiki/sententious#Pronunciation for an example.

MarkAHershberger added a comment.Via ConduitDec 2 2010, 6:13 PM

Reopening after discussion on IRC. I would suggest that this be done as an extension instead of a patch to MediaWiki proper.

brion added a comment.Via ConduitMay 26 2011, 5:51 PM

I'm removing the blocker on bug 26207; these days we'd want this implemented as a parser function, so no new syntax extension system is needed.

The old patch above should be looked over to see if it can be adapted or used to inspire a modern version.

bzimport added a comment.Via ConduitAug 24 2011, 2:53 PM

sumanah wrote:

John, looks like you looked at the patch and found it obsolete enough that we cannot adapt it into a modern version?

bzimport added a comment.Via ConduitJan 17 2013, 2:49 PM

xmlizer wrote:

Any news on this bug ? It would be also a good candidate for wikidata to be able to extract the pronunciation of a word in many languages and generate the sound associated

mxn added a subscriber: mxn.Via WebNov 24 2014, 8:55 PM

Add Comment