Microphone upload from browser for reading tutoring with pronunciation assessment
Open, LowPublic
Actions

Assigned To

None

Authored By

	Harjotsingh
	Jun 3 2017, 2:24 AM

Description

As suggested on to do page of quiz extension, sound clips can be used for assessment.
The upload from browser feature isn't completed yet.
More information about microphone upload over here

2018 version: https://www.youtube.com/watch?v=Bof5sJWZ100&t=103s
2017 paper: https://arxiv.org/pdf/1709.01713.pdf
2018 blueprint: https://www.ets.org/Media/Research/pdf/RM-18-02.pdf
2020 multi-phrasal version: https://www.amiralearning.com/

Related Objects

Mentioned In: T89761: Create new Python library to serialize Wikimedia Quiz format, GIFT quiz format, and allow round-trip conversions between the two formats.

Event Timeline

Harjotsingh created this task.Jun 3 2017, 2:24 AM

Hmm, is this really a quiz extension thing? Doesn't *sound* like it. (pun intended).

Mvolz changed the task status from Open to Stalled.Jun 3 2017, 1:46 PM

Maybe the task is for embedding sound files inside a quiz? Does that work? I.e. Like in wikitionary: https://en.wiktionary.org/wiki/Template:audio

https://en.wikiversity.org/wiki/Help:Quiz#Adding_music_and_sound_effects specifies that audio can be embedded, I think reported feature is related to creating a web interface that would allow a user's microphone to record a file, for which the proposal has been made (link).

It is wikiversity thing rather than quiz.

Not sure what to tag this then...

As suggested on to do page of quiz extension,

Links generally welcome so anyone can look up potential previous discussion.

More information about microphone upload over here

Just pointing out (for anybody visiting it) that the page was last updated in 2010 and hence some info there might be outdated.

The upload from browser feature isn't completed yet.

Does that mean there is some code somewhere that you could point to? Or does "not completed" rather mean "not existing at all yet"? :)

@Harjotsingh: I assume you plan to work on fixing this, as the task is assigned to you and has a higher ("normal") priority set?

Would this end up as a MediaWiki extension (MediaWiki-extension-requests) which automatically converts to accepted file formats? "Upload from browser" means reusing Special:Upload somehow?
Or is this all too early to ask? Curious about your plans. :)

If you allow me to make a comment, audio files can be embedded inside a quiz as you can see here.

The page linked in the description is about allowing MediaWiki to record directly from the browser, create a file and upload it to commons. Mostly to help with creating audio files for the Wiktionary. That would be a feature separated from the quiz extension (I guess).

If you want to implement what is mentioned in the title of this item, I think that would look like: allow audio capture from the browser as part of a question, compare it with a specified file to figure out if it is close enough, report a result and discard the recording. That would help with projects trying to use the pronunciation files in Commons to teach languages.

It would be nice but I think it is a complicated feature since it requires some sort of library to analyze and compare audio and I am not aware of anyone looking for it, at least in the en and es Wikiversities. Not sure about the other languages.

@Harjotsingh: Could you answer T166929#3313083 please? Thanks :)

Suggestion given at - Archived to do list

Currently the proposal has status doing, but no link to code is available.

As @Lsanabria suggested this would need to make a feature that will allow mediawiki to directly from the browser, create a file and upload it to commons. Mostly to help with creating audio files for the Wiktionary.

But it seems to be out of scope for my GSoC project, which is primarily upgrading Quiz extension and add Data storage.This would require to make another extension/feature and then implement it on Quiz.

...and note that according to https://en.wikiversity.org/wiki/Help:Media https://en.wikiversity.org/wiki/Wikiversity:Uploading_files (but that's English Wikiversity only) only OGG format seem to be allowed.

Harjotsingh removed a parent task: T160257: Proposal : Adding Custom features while upgrading and updating Quiz extension - Gsoc17.Jun 16 2017, 10:31 AM

Jayprakash12345 subscribed.Jul 18 2017, 1:40 PM

• Jsalsman subscribed.Sep 7 2017, 12:22 AM

Brijsri claimed this task.Sep 7 2017, 10:14 AM

Brijsri subscribed.

Please see https://en.wiktionary.org/wiki/Wiktionary:Grease_pit/2017/August#Pronunciation_evaluation_gadget_for_Wiktionary:_GSoC_2017

Removing the Possible-Tech-Projects tag as we are planning to kill it soon! This project does not seem to fit in the Outreach-Programs-Projects category in its current state, so I am not adding that tag right now!

@Brijsri and I are working on this. Our paper from last year just got cited by some speech language pathologist instructional designers at Texas A&M and Sydney: https://psi.engr.tamu.edu/wp-content/uploads/2018/04/hair2018idc.pdf

Also, the 2011 bug fix from Dr. Nakagawa we are including is tremendously important, with social impacts on thousands of migrants: https://sourceforge.net/p/cmusphinx/mailman/message/36357239/

Someone who wishes to remain anonymous offered to review and interface with the Wiktionary admin community last year. I've not forgotten that kindness and hope to accept it soon. We're building a freemium site doing adaptive learning for those who want to try words in context, and are independently fundraising for the pure javascript wiktionaries' solution.

@Jsalsman, I don't recall offering to review and interface w/ Wiktionary admin community. I do not edit on Wiktionary.

Bamyers99 unsubscribed.Jul 12 2018, 5:35 PM

BAMyers, please accept my apologies for confusing you with someone who wishes to remain anonymous. Thankfully my mistake prevented a larger one.

and I are working on this. [...] We're building a freemium site

@Jsalsman: Could you please provide an URL where to find your work-in-progress code? What does the word "freemium" mean?

@Aklapper sure, https://github.com/pobedyn/featex is the GSoC 4 feature/phoneme feature extraction code from last year, as published. Since then we've added five more features per phoneme as per slide 13 of http://j.mp/irslides and soon we will have 10 features, adding the nasal flap. We're converting that from Python to Google Firebase, or at least we were before I started having latency problems with it, so we might just stick to Python Flask.

Fremium means a sliding scale. We want the adaptive site to be self-supporting, so occasional interstitial qualification (multiple choice) questions are delivered to the learners, and if they have the financial means they are asked to contribute. The interstitial qualification system is also used for referrals. If they need to learn conversational English in a short time, we give up on them and try to get them to register with e.g. Berlitz or EF for an immersion class. That might be a source of revenue at greater volumes, too. All of these billing options are theoretical at present and not yet implemented.

Thanks, though I do not understand many words, like what "referrals" are here or what "phoneme" is or what an "adaptive site" is. Maybe too complicated for me.

Looks like your project uses https://github.com/jsalsman/featex/issues to track tasks and might have a slightly different scope, if I understand correctly.

Aklapper added a project: Technical-Tool-Request.Jul 16 2018, 2:18 PM

I forgot to include Brij's single-line widget for Wiktionary: https://brijmohan.github.io/iremedy/single_line.html

Referrals in this case would be people who want to learn a language faster than non-immersion or non-brick-and-mortar or non-instructor-led-online can teach. We have a pretty good chance at doing better than those last two (at class sizes above ~1.9 students per teacher, according to e.g. http://www.cs.cmu.edu/~listen/pdfs/AI_ED_2001_WRMT_WC_camera_ready.pdf and http://users.sussex.ac.uk/~bend/papers/meta-reviewsIEEEV5.pdf ) so, we "refer" those students in greater need of help we can provide, and some proportion of them theoretically sign up for immersion classes, and some proportion of their tuition keeps the adaptive site running for the less affluent.

If we stick with the Python architecture on a Google Cloud instance like http://sphinxcapt.org is now, we may be less than ten user stories from completion, working https://commons.wikimedia.org/wiki/File:Tasks_for_intelligibility_remediation_peer_learning_architecture.pdf into the new attached database schema.

schema-diagram-labels2.pdf347 KBDownload

@Aklapper suppose for the sake of argument that the Foundation hires Brij as a contractor for the month he has before starting his Ph.D. at INRIA. Are there any downsides? The upside is that we need to build the freemium adaptive system before we can collect enough data to get the accuracy to levels appropriate for Wiktionary. (We've been saying 95%, and we're at about 82%, whereas the state of the art is 58% as per this 2018 SpeechRater 5.0 stat from ETS: https://pbs.twimg.com/media/DcM4ETiUwAIW-Fk.jpg )

@Jsalsman: That seems very offtopic for this task and I do not know why you ask me such questions.

@klove who is the correct resource to ask about this?

@Jsalsman: Please see and follow https://meta.wikimedia.org/wiki/Grants:Start - again, this is off-topic for this very task.

@Brijsri: I am resetting the assignee of this task because there has not been progress lately (please correct me if I am wrong!).
Resetting the assignee avoids the impression that somebody is already working on this task. It also allows others to potentially work towards fixing this task.
Please claim this task again when you plan to work on it (via Add Action... → Assign / Claim in the dropdown menu) - it would be welcome! Thanks for your understanding!

Thanks @Aklapper, and for your and @LucasWerkmeister's help on wikitech-l with e.g. https://github.com/lingua-libre/RecordWizard and https://meta.wikimedia.org/wiki/User:Urvaxhi/speechToText.js and my old strategy proposal. I am reviewing the first two.

There has been considerable progress beyond what is documented at http://j.mp/slig Such as using the cubic (or perhaps harmonic) mean of the attached Microsoft SAPI 5(.1?) SDK C++ program for unconstrained responses (free-form spoken response questions instead of read-off-the-screen only) allowing for temporal keyword spotting as a front-end to the assessment interaction process:

DictationConfidence.cpp6 KBDownload

Here is my latest attempt to load a database from my flat file collection:

dataset-homophones.py10 KBDownload

I will keep you appraised of my progress there, which I expect to be rapid from this point on, thanks in large part to your help. Let me know if you would like me to hold a 0:30+Q&A workshop at the Foundation offices in San Francisco, please.

@Halfak thanks also for your help. (And, I'm sorry I didn't catch the toxicity sniffer bugs, those are some doozies!)

@Brijsri are those recorders better than the old WebRTC code we used two years ago?

@Lucas_Werkmeister_WMDE @LucasWerkmeister is https://github.com/lingua-libre/RecordWizard/tree/master/modules/ui as cross-platform as the https://voice.sanalabs.com UI which uses https://www.npmjs.com/package/react-mic ?

LucasWerkmeister removed a subscriber: Lucas_Werkmeister_WMDE.Jul 9 2019, 5:00 PM

According to [its package.json](https://github.com/lingua-libre/RecordWizard/blob/2eff53e8800b21d3b6747637f0f70fc90d354208/package.json#L14), it uses its own lingua-recorder library, which has a browser compatiblity section in the README.

@LucasWerkmeister thank you. https://dev.lingualibre.fr/demo/sandbox.html is apparently the demo pointed to https://dev.lingualibre.fr/demo/ there. I am a huge fan of https://dev.lingualibre.fr/demo/simple.html

There is a 0x0 pixel telephony version for the visually disabled planned, which is just a chatbot asking people to say things and talking to them about what they got wrong when they get something wrong, e,g., top-1 or -2 mistake(s). We can use https://chat.dbpedia.org as a starting point, originally using a non-neural, procedural shell inside which we may turn on the actual neural nets described in https://www.researchgate.net/profile/Diego_Moussallem/publication/326030040_Neural_Machine_Translation_for_Query_Construction_and_Composition and implemented at https://github.com/dbpedia/neural-qa -- for the time being, that will just be a glorified ELIZA. Technically it's a variety of LUNAR https://web.stanford.edu/class/linguist289/woods.pdf until we actually train a neural net of some kind with it, but it's probably better just to forward unknown responses to a more advanced chatbot. The non-visual version can use the Twilio API:

https://support.twilio.com/hc/en-us/articles/223132867-Recording-a-Phone-Call-with-Twilio

https://www.twilio.com/docs/voice/quickstart/python#install-python-and-the-twilio-helper-library

https://www.twilio.com/docs/voice/api#build-a-conversational-ivr

https://www.twilio.com/docs/autopilot/guides/how-to-build-a-chatbot#programmable-chat

https://www.twilio.com/docs/libraries/python

But of course the wikipedias, wiktionaries, and Wikiversity are probably not going to be interested in the non-visual version, so your help is spectacular.

@Halfak can you find someone who wants to make a chatbot summarizing and critiquing automatic parses of various speech recognition engines' transcription results using, for example, the LOGON parser? http://erg.delph-in.net/logon

@Halfak and http://englishprofile.org/english-grammar-profile/egp-online ?

@Brijsri can you use

words-fixes.txt3 KBDownload

to continue work on

dataset-homophones.py10 KBDownload

@Brijsri here is the full 987 word file:

words.txt25 KBDownload

@Aklapper is there a graphics standard saying for visual representations to use heat map color palettes which convey the same information in greyscale as they do in color, such as Viridis?

LucasWerkmeister unsubscribed.Jul 14 2019, 4:16 PM

Reedy unsubscribed.Jul 14 2019, 4:23 PM

@Jsalsman: Why do you ask me specifically? Why do you think that I could answer your question?

@Aklapper if you aren't not familiar with the answer, your idea of who would be most likely to know is someone with whom I want to talk about accessibility.

@Jsalsman: My question was why you asked me specifically and why you think that I could answer your question, which you did not answer.
It feels like you've added a bunch of people to this task and some of those added people do not know why you're doing that.

@Aklapper do you want to be able to train dozens of languages or hundreds? I want to know if you know about graphics standards, and if you do, then I would like to talk to you about accessibility. If you don't, I would like to talk to the person you think is most likely to know who might know, about accessibility.

@Brijsri how's this?

def load_homophones_and_phonemes(spelling_schema=True):
    db = dataset.connect(connect_string)
    if spelling_schema:
        t = db['words']
        t.create_column('homops', ARRAY(Text))
    wt = db['words']
    pt = db['prompts']
    ut = db['utterances']
    with open('words.txt', 'r') as f:
        while l in f:
            p = l.strip().split('#')[0].split('also')[0].split()
            h = l.strip().split('#')[0].split('also')[1].strip().split(', ')

@Jsalsman: Could please you simply answer my question instead? Again, based on which criteria do you ask random people about random things?

@Aklapper I can think of nobody other than you who would be more likely to know about graphics standards for colorblindness compatibility. I can think of nobody more likely than @Halfak who would know if there are people who want to work on an interactive speech-in and voice-out chatbot compatible with 0x0 pixel accessibility to the blind. I'm trying to get @Brijsri to transition our pronunciation assessment and intelligibility remediation system from Firebase to something more appropriate for Toolforge Labs, in a way it can work with both single wiktionary words as well as phrases in which they appear.

However, none of you are random unless you consider the stochastic Darwinian history of the universe, unlike my use of .split('also')[1] in the homophone parsing code for

words.txt25 KBDownload

above, because it assumes that each line has the word "also" which is incorrect. There is still a spurious semicolon as I was using to separate alternate pronunciations based on our collected exemplar database, in a duplicated entry for "live" -- which we have as almost entirely rhyming with "give" in our exemplars, even though rhyming with "dive" won a recent informal poll: https://twitter.com/jsalsman/status/1146610783310106624

@Jsalsman: I know absolutely nothing about "graphics standards for colorblindness compatibility" and I wonder why you think that I would and I do not understand most of the stuff that you've been writing here (which makes me also wondering if there is a big XY problem in this task). Cheers :)

@Aklapper Thanks anyway; if you find out let me know.

@Halfak if there is interest in using educational chatbots in any manner, there are various ethical considerations involved e.g. this https://iopscience.iop.org/article/10.1088/1742-6596/1087/3/032003/pdf is located in West China, where corpuses such as http://www.roseducation.org/sell-corpus/corpus.html are not:

Screen Shot 2019-07-05 at 2.56.08 PM.png (402×502 px, 155 KB)

@Brijsri I am at Google asking about now.sh from zeit.co

weni01 subscribed.Aug 15 2019, 5:07 AM

@Halfak I have been unable to recruit http://twiliojob.speakclearly.info but am trying again with a new approach.

Can you please suggest an appropriate consideration for receipt of http://gl.speakclearly.info/static/gooddb-sql.txt (600 MB) under CC-BY-SA? It is: 986 words, 1150 prompts, 34623 utterances, and 84878 transcriptions in Postgres from

dataset-homophones.py23 KBDownload

My preferred form of compensation is, "sure we can hire three or four speech/phonology, ML, telephony, QA, and DevOps people to try to get this to work, and here's a huge cash reward so you can pay off all your bills and take a long enough vacation that you can help mentor the project to completion ."

I am back to https://github.com/cleandersonlobo/react-mic because https://www.speechace.co/api_sample/
I wish I was able to use React components without having to learn React. I still haven't sent anything out to the http://bit.ly/slig list yet, so if you want me to ask them or my ~600 volunteer learners, Brij and I can help with those too. For reference we are trying to make an 80% telephony system now which will still have a web interface, like https://www.ets.org/Media/Research/pdf/RM-18-02.pdf I need to work on remixing the exemplars, by diphone this time instead of phoneme. That will solve the problem with quiet consonants.

@Brijsri at this juncture we need to decide about whether to include anything from https://www.docdroid.net/iWiA1ik/eybenetal2016ieeetransaffectcomput.pdf (which will get us pitch for Chinese, Vietnamese, etc.) and https://arxiv.org/pdf/1905.06533.pdf

tlcrp2 KBDownload

Prompts and user interactions can be added to an intelligibility assessment and remediation system using Tolchorp, the Topic-Lesson-Choice-Response-Prompt (TLCRP) format, mediatype text/tlcrp. Tolchirp (previous versions: "tolchorp") is not YAML but is similar.

// example

%tlcrp
topic: tolchirp example
 level: 1 // CEFR A1
 lesson: format
   level: 6 // CEFR C2
   choice: Are you enjoying the demonstration?
     media: beep.mp3
     mediatype: audio/mp3;text=filename
     response: {affirmative}
       result: good
     response: {negative}
       result: bad
     response: what
       result: Are you enjoying the demonstration?
       freeform: true
   choice: good
     media: Excellent!
   choice: bad
     media: I'm sorry to hear that.
prompt: {affirmative}
  words: yes|yeah|sure|ok|fine|ja|si
prompt: {negative}
  words: no|nah

Topic	Lesson	Choice	Response	Result	Prompt	...
Tolchirp example	Format	Are you enjoying the demonstration?	{affirmative}	Good	yes,	Yeah
	Level 6	Good	Excellent!
		Bad	I'm sorry to hear that.
		What	Are you enjoying the demonstration?
		Freeform	{negative}	Bad	no	nah

• Jsalsman renamed this task from Microphone upload from browser for reading tutoring with pronunciation assessment in Wiktionary to Microphone upload from browser for reading tutoring with pronunciation assessment.Mar 8 2020, 5:22 AM

• Jsalsman claimed this task.

• Jsalsman raised the priority of this task from Lowest to Medium.

• Jsalsman updated the task description. (Show Details)

• Jsalsman mentioned this in T89761: Create new Python library to serialize Wikimedia Quiz format, GIFT quiz format, and allow round-trip conversions between the two formats. .Mar 8 2020, 5:25 AM

Removing assignee as that account is disabled.

Uh I did not want to change task status, sorry

	F29787035: Screen Shot 2019-07-05 at 2.56.08 PM.png
	Jul 17 2019, 12:30 AM

	F30301425: tlcrp
	Sep 11 2019, 8:53 PM

	F30287481: tlcrp
	Sep 10 2019, 2:16 AM

	F30287402: tlcrp
	Sep 10 2019, 2:08 AM

	F30287280: tlcrp
	Sep 10 2019, 1:55 AM

Microphone upload from browser for reading tutoring with pronunciation assessmentOpen, LowPublicActions

Description

Related Objects

Event Timeline

Microphone upload from browser for reading tutoring with pronunciation assessment
Open, LowPublic
Actions