Page MenuHomePhabricator

IPA or SAMPA module
Closed, DuplicatePublic


Author: xmlizer

The goal is to give to mediawiki and especially for wiktionnary a IPA/SAMPA
module which work like ISBN module in wikipedia

Just type :
IPA : [toto]
SAMPA : [toto]

and it will make a link (a may be an icon) to ear in midi the phonetic

Version: unspecified
Severity: enhancement
See Also:



Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 6:44 PM
bzimport set Reference to bz224.
bzimport added a subscriber: Unknown Object (MLST).

wiki wrote:

Very good idea! Of course it should read: IPA: [toto] or SAMPA: [toto], the
square brackets shouldn't be obligatory, since sometimes phonological
transcriptions may be used, so it would read: IPA: /toto/.

ser_canof wrote:

I'm with you. However IMHO is better a form like <ipa>roman_transcription</ipa>.
It could be possible using the LaTeX extension named TIPA and the software made
for <math></math>. In my imagination the enhancement should make an image from
the code in <ipa></ipa>, exactly like <math>.

david wrote:

I am working on this right now. It's a generalized module that can input and
output phonetic representations in a variety of formats:

Unicode IPA (UTF-8)
Unicode IPA (HTML entities)

as well as a few more obscure ones:
tipa (the TeX IPA package)
a modified version of the system used in _Big Book of Beastly
Mispronunciations_, which gives things like <small>KAL</small>-i-FOR-nyuh

I'm still working on coding each of the modules--I currently have a Unicode IPA
(UTF-8) reader, and a writer for Unicode IPA (UTF-8), Unicode IPA (HTML
entities), X-SAMPA, and Big_Beastly. I'm working on the X-SAMPA reader
currently. I'm not sure readers will be needed for Kirshenbaum, tipa, and

Also, I haven't yet worked out the syntax for denoting phonetic strings.
Thankfully, I've designed it so the syntax is not integral and the modules
should be compatible with any syntax.

A few other notes:

  • Generating audio pronunciations will require the installation and use of a TTS

system. Unfortunately, current Free TTS systems sound like crap. I don't think
there is much point at this juncture to invest development time in automatically
generating audio pronunciations.

  • Generating an image of the IPA _could_ be done by connecting the phonetics

module to texvc and installing the LaTeX TIPA package. I'm not going to invest
development time in this right now other than generating TIPA-compatible output.

  • The syntax should be able to specify what format the input is in and what

format(s) the output should be in. Further, there should be a reasonable default
for both of these. I would advocate Unicode IPA -> Unicode IPA as the defaults.
Further, there should be a set of standard templates for generating phonetic
outputs in various formats. We can add some IE-specific CSS to explicitly
specify the font to a set of fonts known to contain IPA symbols (this isn't
necessary with other browsers because they substitute in Unicode characters from
other fonts when the current font doesn't contain the requested character).
Finally, there should be a user preference to set the preferred output format
for phonetic data that would override anything that uses the defaults.

<phon input=xsampa>"hE.loU</phon> -> IPA output
<phon input=xsampa output=bb>"hE.loU</phon> -> HE-loh

but there would be templates for this


This needs more thought.

david wrote:

Patch to Setup.php in support of files to be uploaded subsquently

This is a diff for Setup.php that includes "Phonetics.php" ( to be uploaded )

attachment Setup.php.diff ignored as obsolete

david wrote:

Phonetics.php file to go into includes/

This is the Phonetics.php file which supports the phonetics extensions. To be
described more fully in a forthcoming comment.

attachment Phonetics.php ignored as obsolete

david wrote:

archive containing files used to generate Phonetics.php

This is an archive containing the files that are used to generate
Phonetics.php, including a Makefile. To be described in a forthcoming comment.

attachment phonetics.tar.gz ignored as obsolete

david wrote:

OK. I have uploaded 3 attachments that implement the IPA/SAMPA solution I have created.

Overview of what it does: it supports the following new tags: <ipa> <ipa-en> <xsampa> <xsampa-en>. The <ipa> tag takes IPA
Unicode input (either UTF-8 or numeric entities) and returns 2 <span>s: one containing the IPA Unicode in all numeric entities, and the
other containing the equivalent X-SAMPA. The <xsampa> tag takes X-SAMPA input and returns the same <span>s as <ipa>. The -en
versions of the tags are identical, except they also return a third <span> containing the phonetics in a "simple English" phonetic format.
This option is in a separate tag because this only works with English phonemes.

Overview of how it works: Phonetics.php is auto-generated from some files in the phonetics.tar.gz archive. The translation tables are
generated via a perl script from a tab-separated text file containing all the correspondances between phonetic systems. The translation
tables are then #included via cpp into the php source (Phonetics.phpi). PHP functions include() or require() won't work for this because
they can't be called from within a class definition.

david wrote:

The newest versions of this depend on Parser that supports parameters in tags

david wrote:

archive containing files used to generate Phonetics.php

newer version that eliminates the previous tags and now supports just the
<phon> tag which takes attributes "encoding" and "display"

attachment phonetics.tar.gz ignored as obsolete

david wrote:

Revised version of Phonetics.php

New version of Phonetics.php, generated by files in attachment 88

attachment Phonetics.php ignored as obsolete

xmlizer wrote:

Do the patch work correctly ?

Reopening after discussion on IRC. I would suggest that this be done as an extension instead of a patch to MediaWiki proper.

I'm removing the blocker on bug 26207; these days we'd want this implemented as a parser function, so no new syntax extension system is needed.

The old patch above should be looked over to see if it can be adapted or used to inspire a modern version.

sumanah wrote:

John, looks like you looked at the patch and found it obsolete enough that we cannot adapt it into a modern version?

xmlizer wrote:

Any news on this bug ? It would be also a good candidate for wikidata to be able to extract the pronunciation of a word in many languages and generate the sound associated