Pronunciation recording tool (tracking)
Closed, InvalidPublic

Description

Several people (e.g. https://bugzilla.wikimedia.org/show_bug.cgi?id=31221, though the original report asks for computer text-to-speech, and http://comments.gmane.org/gmane.org.wikimedia.wiktionary/1265) have requested a tool to simplify the workflow of recording the pronunciation of a word.

The basic idea is to provide a wizard flow for picking a word (which may be the page you're on), recording it, choosing a free license, then uploading it to Wikimedia Commons with the appropriate metadata.


Version: master
Severity: enhancement
URL: http://thread.gmane.org/gmane.org.wikimedia.wiktionary/1265
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=31221
https://bugzilla.wikimedia.org/show_bug.cgi?id=20252
https://bugzilla.wikimedia.org/show_bug.cgi?id=53074

bzimport added a project: PronunciationRecording.Via ConduitNov 22 2014, 1:22 AM
bzimport set Reference to bz46610.
Mattflaschen created this task.Via LegacyMar 27 2013, 5:40 PM
bzimport added a comment.Via ConduitMar 27 2013, 6:39 PM

wmf.amgine3691 wrote:

Note: This will also need to take into account the L2 sections, which are used to indicate the language. For example, https://en.wiktionary.org/wiki/chance#English https://en.wiktionary.org/wiki/chance#French etc.

Nemo_bis added a comment.Via ConduitMar 28 2013, 9:59 PM

Well, the tool could simply be added by a parser function called with both word and language (and possibly something else for homographs), this seems the least of the problems. :)

Qgil added a comment.Via ConduitMar 28 2013, 10:33 PM

Just like the "Edit" link appears next to each section, the [Record button] could be placed next to any word missing voice recorded pronunciation, right?

Question: what is the page describing the current workflow? It is not evident to see how a user can contribute a pronunciation now.

Also: what happens if an audio file already exists but I think I can contribute a better one e.g. because of audio quality or some other defect?

Needless to say, this is a feature that is calling for a mobile UI sooner or later... Think of all those languages spoken in countries with a high penetration of mobile devices.

Mattflaschen added a comment.Via ConduitMar 28 2013, 10:44 PM

The current procedure on English Wiktionary is https://en.wiktionary.org/wiki/Help:Audio_pronunciations . Other projects probably have somewhat different procedures.

I suggest the tool initially only show on pages without existing recordings. It would be good to solve that problem eventually, but it is more likely to require discussion (should we keep both because they have slightly different accents?, etc.)

Also, I skipped the final part of the flow, adding the template (e.g. Template:audio on English Wiktionary) to the Wiktionary page.

bzimport added a comment.Via ConduitMar 29 2013, 12:30 PM

wmf.amgine3691 wrote:

I suggest the tool initially only show on pages without existing recordings.

Not sure I would agree. The many dialects of English, for example, can be dramatically different. 'Schedule' springs to mind[1].

Although I'd love to get into a discussion about collecting metadata with recordings (geoip location of author, self-identity of dialectic origins, etc.) I think at this point we should focus on the basic mechanics: user button to record a brief audio snippet which is auto-uploaded to commons with authoring/license templates, and the local wiktionary page updated.

[1] https://en.wiktionary.org/wiki/schedule#Pronunciation

Mattflaschen added a comment.Via ConduitMar 29 2013, 7:09 PM

I agree. I wasn't proposing complicated metadata, just the basics (license template of course, Category:$LANGUAGE pronunciation, maybe a hidden category to mark recordings from the tool).

The reason I suggested keeping it simple by showing on pages without recordings is to avoid collisions. E.g. what happens if I live in the U.S. but have a different pronunciation of https://commons.wikimedia.org/wiki/File:En-us-associate.ogg ? But it looks like they resolve collisions by just adding a number, https://commons.wikimedia.org/wiki/File:En-us-associate-2.ogg, which is easy enough for a tool to do.

bzimport added a comment.Via ConduitApr 5 2013, 8:22 PM

rahul14m93 wrote:

I have prepared a rough project proposal Please do give me your feedback and
suggestions so that i can improve on it
https://www.mediawiki.org/wiki/User:Rahul21/Gsoc

Qgil added a comment.Via ConduitApr 9 2013, 4:03 PM

Hi Rahul,

Through the different discussions so far we have seen that this project might be more tricky than what it looked like initially. And the main problem is still that no mentor is stepping in.

I recommend you to wait a couple of days more and the make a decision: bet blindly on this proposal with the hope that things will be solved in the next weeks or put it aside and bet on some other idea for GSoC.

You can still work on a voice recording tool as a pet project, but from my point of view it still lacks some essential factors to consider it for this GSOC: no mentor and no enthusiastic response from Wiktionary community.

Mattflaschen added a comment.Via ConduitApr 9 2013, 7:46 PM

On the Wiktionary front, has anyone reached out to them on a prominent place on-wiki?

As far as the technology, there seems to be at least one workable approach using the HTML5 Media Capture API (http://www.w3.org/2009/dap/wiki/ImplementationStatus#HTML_Media_Capture and https://news.ycombinator.com/item?id=4001140). I haven't tested this myself yet; the browsers are mostly mobile. See http://mobilehtml5.org/ts/?id=23 for the syntax and a simple page for testing.

As getUserMedia develops that could become an alternate approach.

So it might be workable if a mentor becomes available.

bzimport added a comment.Via ConduitApr 9 2013, 8:02 PM

rahul14m93 wrote:

I will surely have a look at them.Micheal Dale is ready to mentor :)

bzimport added a comment.Via ConduitApr 9 2013, 9:35 PM

mdale wrote:

Confirmed. As mentioned on IRC, would be nice to also support the record this article, or paragraph out loud, for the spoken articles project.

Nemo_bis added a comment.Via ConduitApr 9 2013, 10:04 PM

(In reply to comment #9)

On the Wiktionary front, has anyone reached out to them on a prominent place
on-wiki?

The central discussion place for Wiktionary is Wiktionary-l. It's not Wikipedia or Wikisource. Anyway I'll send more notifications to all languages.

Qgil added a comment.Via ConduitApr 9 2013, 10:19 PM

Sometimes you help by doing something and sometimes you help by NOT doing something. :)

I'm happy to have helped indirectly finding a mentor for this project. The GSoC process continues. Thank you Rahul, thank you Michael and thank you to the rest of people helping to move this feature forward.

Qgil added a comment.Via ConduitApr 11 2013, 9:46 PM

Adding dependency to WAV support to document the discussion in the past hours at wikitech-l. Feel free changing this if the plan changes.

bzimport added a comment.Via ConduitApr 28 2013, 4:13 AM

rsurratt wrote:

(In reply to comment #13)

I'm happy to have helped indirectly finding a mentor for this project. The
GSoC
process continues. Thank you Rahul, thank you Michael and thank you to the
rest
of people helping to move this feature forward.

Hello all, I have developed a mediawiki extension (not released yet) that plays an ogg file on hover for any word (so far just English... abt 2500 words) that it knows about in any page. Also it inserts a play button on hover to keep playing and highlighting all words it knows. In addition a very short definition is displayed also with words hoverable and playable. Words are from simple wiktionary.

I also have made a javascript sound recorder that uses wami (https://code.google.com/p/wami-recorder/) to record and soundmanager (http://www.schillmania.com/projects/soundmanager2/) for playback. I believe both use HTML5 if it is available with a fallback to Flash if it is not available. I also use ffmpeg and sox to do some server side processing (wav->ogg and trimming silence at start and end of word). This is part of another project I have been working on to help someone learn a new language. I could make this available to anyone who would want it or I could maybe take a shot at implementing Rahul's suggestions. I am totally new to mediawiki so any guidance would be helpful.

Mattflaschen added a comment.Via ConduitApr 28 2013, 4:51 AM

(In reply to comment #15)

I also have made a javascript sound recorder that uses wami
(https://code.google.com/p/wami-recorder/) to record and soundmanager
(http://www.schillmania.com/projects/soundmanager2/) for playback. I believe
both use HTML5 if it is available with a fallback to Flash if it is not
available.

As far as I can tell from the source code (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires Flash.

bzimport added a comment.Via ConduitApr 28 2013, 5:37 AM

rsurratt wrote:

As far as I can tell from the source code
(https://code.google.com/p/wami-recorder/source/browse/), WAMI requires
Flash.

I no idea, is flash a no-no for mediawiki?

Bawolff added a comment.Via ConduitApr 28 2013, 5:47 AM

(In reply to comment #17)

> As far as I can tell from the source code
> (https://code.google.com/p/wami-recorder/source/browse/), WAMI requires
> Flash.

I no idea, is flash a no-no for mediawiki?

Flash is very controversial for political reasons. It may be acceptable as a fallback mechanism on old browsers that don't support the latest and greatest html features, however its a no-no to require flash (might be considered acceptable if the particular flash thing works with gnash). Java is much more considered ok, but still not exactly loved either.

bzimport added a comment.Via ConduitApr 29 2013, 3:05 PM

mdale wrote:

(In reply to comment #15)

an ogg file on hover for any word (so far just English... abt 2500 words)
that it knows about in any page.

sounds like a fun, would work best as a gadget, and query the real wikitionary.

I also have made a javascript sound recorder that uses wami
(https://code.google.com/p/wami-recorder/) to record and soundmanager
(http://www.schillmania.com/projects/soundmanager2/) for playback.

I like flash fallbacks, I understand we should not require flash, but as a fallback its great. Its much less of a patented, fragmented, and security failure than in-browser java.

I also use ffmpeg and sox to do some server side processing
(wav->ogg and trimming silence at start and end of word). This is part of
another project I have been working on to help someone learn a new language.

We should to try to trim client side if possible.

I could make this available to anyone who would want it or I could maybe take a
shot at implementing Rahul's suggestions. I am totally new to mediawiki so
any
guidance would be helpful.

Cool I am sure Rahul will touch base with you.

bzimport added a comment.Via ConduitApr 29 2013, 3:13 PM

rahul14m93 wrote:

(In reply to comment #15)

I also use ffmpeg and sox to do some server side processing
(wav->ogg and trimming silence at start and end of word). This is part of
another project I have been working on to help someone learn a new language.

I am interested. Could you come on the irc where we can have good discussion regarding this.

bzimport added a comment.Via ConduitApr 30 2013, 3:18 PM

rsurratt wrote:

(In reply to comment #19)

(In reply to comment #15)
> an ogg file on hover for any word (so far just English... abt 2500 words)
> that it knows about in any page.

sounds like a fun, would work best as a gadget,

How does one go about getting a user script approved as a gadget?

and query the real wikitionary.

the extension opens up a new tab on the real wiktionary for the definition of any word on mouse click but on hover it just uses the definition (if it exists) in the small dictionary I have made,(a json file about 60k compressed).

The sound files it uses are mostly from what is used in the English wiktionary but I am in the process of recording new ones that "flow" better when spoken one after another in a sentence.

>I also use ffmpeg and sox to do some server side processing
> (wav->ogg and trimming silence at start and end of word). This is part of
> another project I have been working on to help someone learn a new language.

We should to try to trim client side if possible.

Yes that would be a great solution to the problem of getting ffmpeg and sox executables running on a variety of servers but I have no idea on how to do that. Perhaps Java? If anyone knows how to do that I am all ears.

bzimport added a comment.Via ConduitApr 30 2013, 3:33 PM

rsurratt wrote:

I am interested. Could you come on the irc where we can have good discussion
regarding this.

Hi Rahul, sure, how do I go about doing that?

Mattflaschen added a comment.Via ConduitApr 30 2013, 7:03 PM

(In reply to comment #21)

(In reply to comment #19)
> (In reply to comment #15)
> > an ogg file on hover for any word (so far just English... abt 2500 words)
> > that it knows about in any page.
>
> sounds like a fun, would work best as a gadget,

How does one go about getting a user script approved as a gadget?

Each wiki approves them separately. See https://en.wiktionary.org/wiki/Wiktionary:Gadgets, though I'm not sure where you ask for it to be approved. You can ask at https://en.wiktionary.org/wiki/Wiktionary:Grease_pit .

Please use a separate bug report for the mini-dictionary (ogg, short definition) on-hover idea. It's interesting, but separate from this.

bzimport added a comment.Via ConduitMay 1 2013, 12:36 AM

rsurratt wrote:

(In reply to comment #18)

(In reply to comment #17)

Flash is very controversial for political reasons. It may be acceptable as a
fallback mechanism on old browsers that don't support the latest and greatest
html features, however its a no-no to require flash (might be considered
acceptable if the particular flash thing works with gnash). Java is much more
considered ok, but still not exactly loved either.

I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client side , nothing on server. from them (at https://code.google.com/p/wami-recorder/)

"The WAMI recorder uses a light-weight Flash app to ship audio from client to server via a standard HTTP POST. Apart from the security settings to allow microphone access, the entire interface can be constructed in HTML and Javascript."

sooo is this a deal breaker? Sounds like it. If so, sorry for wasting your time, and perhaps it would be better to wait for a GSoC solution that uses the latest and greatest technology. Unless someone has a suggestion.

bzimport added a comment.Via ConduitMay 1 2013, 5:45 AM

rahul14m93 wrote:

(In reply to comment #25)

I was wrong, WAMI knows nothing of HTML5 and only uses Flash... just client

side , nothing on server. from them (at
https://code.google.com/p/wami-recorder/)

Its okay Ron, you took interest and wanted to help, that itself is a positive sign!

bzimport added a comment.Via ConduitMay 1 2013, 2:31 PM

mdale wrote:

I don't think its a deal breaker, flash makes a great fallback. If we can use the webRTC solution for browsers that support it, then using wami as a fallback is fine.

The restriction against flash for wikimedia projects is based on the idea, that you don't exclusively deliver an experience for proprietary platforms. Using flash or java as a fallback is fine, as long as an open standard / free browser solution is also equally well supported.

bzimport added a comment.Via ConduitMay 1 2013, 2:35 PM

wmf.amgine3691 wrote:

Adobe stopped producing Flash for Linux last year or the year before.

bzimport added a comment.Via ConduitMay 1 2013, 2:48 PM

rahul14m93 wrote:

v 11.2 is the last version supported for linux

bzimport added a comment.Via ConduitMay 1 2013, 3:38 PM

mdale wrote:

Flash support for linux is not relevant. The point is you can get the same experience ( with webRTC ) with free software. The idea is to give an equal experience on flash vs free software platforms.

bzimport added a comment.Via ConduitMay 6 2013, 4:38 AM

rsurratt wrote:

(In reply to comment #31)

Flash support for linux is not relevant. The point is you can get the same
experience ( with webRTC ) with free software. The idea is to give an equal
experience on flash vs free software platforms.

I have been able to get a sound recorder working with the HTML5 Web Audio API in Google's Chrome (Canary version). It is much nicer than the Flash version using WAMI I already had in that it allows things such as user controlled silence removal in the browser. I will next try to cram them both together so as to have the Flash fallback work as closely as possible to the HTML5 version.

I also want to be able to do the editing of pre-existing sounds as well as sounds input with a microphone.

bzimport added a comment.Via ConduitMay 6 2013, 7:31 AM

rahul14m93 wrote:

(In reply to comment #32)

I have been able to get a sound recorder working with the HTML5 Web Audio API
in Google's Chrome (Canary version).

Please can you specify the version and did you enable the flag "Web Audio Input" via "chrome://flags

bzimport added a comment.Via ConduitMay 6 2013, 2:43 PM

rsurratt wrote:

(In reply to comment #33)

(In reply to comment #32)

>I have been able to get a sound recorder working with the HTML5 Web Audio API
>in Google's Chrome (Canary version).

Please can you specify the version and did you enable the flag "Web Audio
Input" via "chrome://flags

the chrome is Version 28.0.1499.0 canary (https://www.google.com/intl/en/chrome/browser/canary.html) and there is no "Web Audio Input" flag in chrome://flags for that version of Chrome.

Mattflaschen added a comment.Via ConduitJun 14 2013, 7:32 PM

I'm removing bug 20252 as a dependency, and moving to see also. It's a nice-to-have, but it's not a blocker in my opinion. These are going to be short files (< 5 seconds, most likely).

Bug 20252 could also be done later, and the files transcoded internally.

bzimport added a comment.Via ConduitJun 16 2013, 7:06 PM

rahul14m93 wrote:

I have undertaken this as my GSoC project, Michael Dale and Matthew Flaschen will be my mentors during the course. The primary benefit is laying the groundwork for contributor-created audio to MediaWiki sites in any current browser. I have a done a little bit of research on the method to upload the pronunciations so far and based on that the use of the Upload:API is essential, other API's like the Edit:API will also come handy. The first step that I plan on doing is to add .wav support to the THM extension. Link to my proposal http://www.mediawiki.org/wiki/User:Rahul21/Gsoc

gerritbot added a comment.Via ConduitJul 24 2013, 8:24 PM

Change 75770 had a related patch set uploaded by Rahul21:
Pronunciation Recording Tool( Not working )

https://gerrit.wikimedia.org/r/75770

gerritbot added a comment.Via ConduitJul 24 2013, 8:53 PM

Change 75770 abandoned by Rahul21:
Pronunciation Recording Tool( Not working )

https://gerrit.wikimedia.org/r/75770

Mattflaschen added a comment.Via ConduitAug 20 2013, 9:53 PM

6f9c18509b858d89e50d145a685eb5308dcdff7e implemented a special page, Special:PronunciationRecording. That includes support for recording a pronunciation and playing it back. The next main step is allowing uploading to the same wiki where the special page is (bug 53127).

There is also now a Bugzilla component for this extension.

Qgil added a comment.Via ConduitSep 17 2013, 4:22 PM

GSoC "soft pencils down" date was yesterday and all coding must stop on 23 September. Has this project been completed?

Mattflaschen added a comment.Via ConduitSep 17 2013, 8:04 PM

The overall project has not been completed, so Rahul will have to keep working until the final pencil down (September 23, as you noted).

The following parts are complete (parts that still need final review are noted). Rahul can add anything I'm missing:

  • Uploading to the stash is complete. Fitting this into the overall upload flow (initially publishing from the stash to the main File page) is in progress and under review.
  • Extension and special page setup
  • WAV support for TimedMediaHandler
  • Some refactoring to UploadWizard (which PronunciationRecorder is using as a library). Mostly merged, a little more in progress
  • Upload permissions check (not merged)
Mattflaschen added a comment.Via ConduitSep 27 2013, 11:48 PM

This is not fully complete. However, it's complete enough that it could be useful. You can try it at http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording .

The main aspect that is not ready is integrating into Wiktionary pages. It also can not currently upload to Commons (it uploads to the current wiki) from another wiki.

However, it does generate the Information template and categories needed for Commons.

Mattflaschen added a comment.Via ConduitSep 27 2013, 11:54 PM

Actually, use http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:PronunciationRecording?debug=true due to bug 54351.

Also, note that you need to use a modern browser with sufficient Web Audio support. Currently, that probably means Chrome, but Firefox is working on the same standards, so it will eventually work in Firefox and other browsers.

Qgil added a comment.Via ConduitOct 22 2013, 7:39 PM

If you have open tasks or bugs left, one possibility is to list them at https://www.mediawiki.org/wiki/Google_Code-In and volunteer yourself as mentor.

We have heard from Google and free software projects participating in Code-in that students participating in this programs have done a great work finishing and polishing GSoC projects, many times mentores by the former GSoC student. The key is to be able to split the pending work in little tasks.

More information in the wiki page. If you have questions you can ask there or you can contact me directly.

Aklapper added a comment.Via ConduitFeb 27 2014, 5:25 PM

Rahul: Are you (still) working on this? If not, please reset the assignee to default and the status to NEW. Thanks!

Mattflaschen added a comment.Via ConduitFeb 27 2014, 7:21 PM

I don't know if we should keep this open now that's it's an in-progress extension with its own Bugzilla component.

However, if we want to, we can use it to mark when the initial Wiktionary functionality (see https://www.mediawiki.org/wiki/User:Rahul21/Gsoc2013/Proposal#Simple_workflow) is done. Basically, a Minimum Viable Product.

Rillke added a comment.Via ConduitMar 8 2014, 10:57 PM

I didn't see any progress here, therefore I re-launched
https://meta.wikimedia.org/wiki/Grants:IEG/Finish_Pronunciation_Recording

You may see this as a competing product product or a chance to get some useful feedback. Cheers!

Mattflaschen added a comment.Via ConduitMar 27 2014, 5:36 PM

(In reply to Matthew Flaschen from comment #42)

This is not fully complete. However, it's complete enough that it could be
useful. You can try it at
http://pronunciationrecording.instance-proxy.wmflabs.org/wiki/Special:
PronunciationRecording .

It's moved to http://pronunciationrecording.wmflabs.org/wiki/Special:PronunciationRecording?debug=true . The new server is open for normal account creations.

If anyone would like special access (e.g. an admin account to test gadgets), let me know.

Gilles added a project: Multimedia.Via WebNov 24 2014, 3:33 PM
Quiddity added a project: Tracking.Via WebJan 7 2015, 8:25 PM
Quiddity set Security to None.
Ricordisamoa awarded a token.Via WebJan 25 2015, 8:54 AM
Ricordisamoa added a subscriber: Ricordisamoa.
Darkdadaah added a project: Wiktionary.Via WebMar 9 2015, 3:21 PM
Liuxinyu970226 added a subscriber: Liuxinyu970226.Via WebMar 22 2015, 3:13 PM
Aklapper added a subscriber: Aklapper.Via WebJul 8 2015, 2:41 PM

I don't know if we should keep this open now that's it's an in-progress extension with its own Bugzilla component.

What is exactly tracked in this task, in contrast to the PronunciationRecording project itself? All "Blocked by" tasks are also part of the project...

Restricted Application added subscribers: Steinsplitter, Matanya. · View Herald TranscriptVia HeraldJul 8 2015, 2:41 PM
Aklapper added a comment.Via WebThu, Aug 20, 2:34 PM

What is exactly tracked in this task, in contrast to the PronunciationRecording project itself? All "Blocked by" tasks are also part of the project.

Mattflaschen closed this task as "Invalid".Via WebSat, Aug 22, 1:53 AM
Mattflaschen claimed this task.

I suggested last February that we could either close it now, or use it to mark the MVP. I'll go ahead and do the former.

Restricted Application removed a subscriber: Liuxinyu970226. · View Herald TranscriptVia HeraldSat, Aug 22, 1:53 AM

Add Comment