I read the paper Blind and human: exploring more usable audio CAPTCHA designs by Valerie Fanelle et al. They suggest a kind of captcha called "Categories", in which the user is asked to count the number of times a particular kind of sound (such as a bird chirp) occurs. Users were often able to answer this challenge successfully, however they noted that it is easily defeated by guessing, that is, answers have low entropy.
It occurs to me that the entropy could be increased, and the mental load reduced, by asking the user to click a button at each occurrence of the target sound, rather than mentally counting the instances. We can record the timing of the clicks and classify the response on the server.
Development of a practical captcha along these lines would still be very daunting if the goal was to defeat an AI trained to distinguish sounds in the target category. But if we reduce the security goals to merely deterring bot authors who have a limited motivation, then the task becomes more tractable. Instead of a large private library of sounds in various target categories, we can make do with just a single sound mixed into a single background. This would still be more secure than the existing visual CAPTCHA, which can trivially be defeated with OCR tools that are already integrated into the relevant bots.
So I wrote a demo of this concept which I am calling "AttentionPlease".
The demo audio was constructed using Audacity from freely licensed source sounds. It is delivered to the browser as an MP3 obfuscated by XORing with a 32-bit key. Part of the key is sent as a response header.
There are a lot of tunable parameters.
Challenge sequence:
- Ask the user to click the button to start. This avoids starting the sound while the screen reader is still reading, and it ensures that the user has moved their virtual cursor to the button they will need to press during the game phase.
- 0 - 0.5: fade in
- 0.5 - 3.0: background
- 3.0: Fixed prompt
- 3.0 - 8.0: background
- 8.0: Hint loop. If the user has not clicked the button, stop the audio. Write a text reminder and restart from the start of the sequence.
- 8.0 - 28: Five prompts are delivered at random intervals.
- 28 - 29.5: background
- 29.5 - 30: fade out
Scoring:
- Match prompts with subsequent clicks with a limit of 4 seconds. The score is 1 at a latency of 0, ramping down to 0 at a latency of 4 seconds. Note that the Mozilla documentation on currentTime threatens to round times to the nearest 100ms.
- Additional unmatched clicks receive a score of -1.
- The scores are added, and then the total is normalized to a percentage.
- I am typically scoring 80-90. I suggest a passing score of 70.
Implementation notes:
- An easy to use reCAPTCHA style API would be nice, but reuse by non-WMF websites could incentivise efforts to break it.
- ConfirmEdit has no support for user-selectable alternative or mixed captchas. Substantial refactoring of the extension would be needed.
- Like FancyCaptcha, challenges can be generated and stored ahead of time. However unlike FancyCaptcha, it is probably not feasible to store the solution in the URL.