Page MenuHomePhabricator

AttentionPlease -- audio CAPTCHA
Open, Needs TriagePublic

Description

I read the paper Blind and human: exploring more usable audio CAPTCHA designs by Valerie Fanelle et al. They suggest a kind of captcha called "Categories", in which the user is asked to count the number of times a particular kind of sound (such as a bird chirp) occurs. Users were often able to answer this challenge successfully, however they noted that it is easily defeated by guessing, that is, answers have low entropy.

It occurs to me that the entropy could be increased, and the mental load reduced, by asking the user to click a button at each occurrence of the target sound, rather than mentally counting the instances. We can record the timing of the clicks and classify the response on the server.

Development of a practical captcha along these lines would still be very daunting if the goal was to defeat an AI trained to distinguish sounds in the target category. But if we reduce the security goals to merely deterring bot authors who have a limited motivation, then the task becomes more tractable. Instead of a large private library of sounds in various target categories, we can make do with just a single sound mixed into a single background. This would still be more secure than the existing visual CAPTCHA, which can trivially be defeated with OCR tools that are already integrated into the relevant bots.

So I wrote a demo of this concept which I am calling "AttentionPlease".

Try the demo

The demo audio was constructed using Audacity from freely licensed source sounds. It is delivered to the browser as an MP3 obfuscated by XORing with a 32-bit key. Part of the key is sent as a response header.

There are a lot of tunable parameters.

Challenge sequence:

  • Ask the user to click the button to start. This avoids starting the sound while the screen reader is still reading, and it ensures that the user has moved their virtual cursor to the button they will need to press during the game phase.
  • 0 - 0.5: fade in
  • 0.5 - 3.0: background
  • 3.0: Fixed prompt
  • 3.0 - 8.0: background
  • 8.0: Hint loop. If the user has not clicked the button, stop the audio. Write a text reminder and restart from the start of the sequence.
  • 8.0 - 28: Five prompts are delivered at random intervals.
  • 28 - 29.5: background
  • 29.5 - 30: fade out

Scoring:

  • Match prompts with subsequent clicks with a limit of 4 seconds. The score is 1 at a latency of 0, ramping down to 0 at a latency of 4 seconds. Note that the Mozilla documentation on currentTime threatens to round times to the nearest 100ms.
  • Additional unmatched clicks receive a score of -1.
  • The scores are added, and then the total is normalized to a percentage.
  • I am typically scoring 80-90. I suggest a passing score of 70.

Implementation notes:

  • An easy to use reCAPTCHA style API would be nice, but reuse by non-WMF websites could incentivise efforts to break it.
  • ConfirmEdit has no support for user-selectable alternative or mixed captchas. Substantial refactoring of the extension would be needed.
  • Like FancyCaptcha, challenges can be generated and stored ahead of time. However unlike FancyCaptcha, it is probably not feasible to store the solution in the URL.

Event Timeline

As a screen reader user, I think this sounds really cool ... I'd never heard of this type of CAPTCHA before! Thanks for your work on this and putting up the demo. I'm scoring around 90 as well ...

There was a scoring bug which gave extra points to early responses, which I fixed.

I modelled the scores achieved by guessing. With parameters similar to the demo audio, submitting equally spaced times achieves an average score of 42, and 9% of guesses exceed a score of 70.

Delivering 5 prompts in 20 seconds is a bit tight, considering the score timeout parameter of 4 seconds.

Increasing the effective challenge time from 20 to 25 seconds reduced the average score by guessing from 42 to 25, and reduced the probability of exceeding a score of 70 from 9% to 1.6%.

Reducing the score timeout parameter also works to reduce guessing scores, although this reduces human scores proportionally. If you want typical human scores to be 80-90 then the timeout should be 4 seconds.

I uploaded the demo code to https://github.com/tstarling/attention-please

Can I suggest that this be released under the GPL? I think decreasing the number of websites which can use it would be A Good Thing, as it would to disincentivize breaking it.

This solution is interesting. Even if it probably will never work without JavaScript.

I have a friend of mine (castix) that visits websites only using lynx (text-based browser web) and they think that this solution is really promising, but unfortunately it probably does not cover that specific case.

Moreover, sometime, some people may have not enough reflection skills to answer. I indeed don't talk about "normal" visual impairment, I talk about some forms of visual + intellectual disability (I'm not an expert in this field).

Unfortunately I have not any additional idea about how to overcome this.