Page MenuHomePhabricator

Statistics on Captcha success/failure rate
Closed, ResolvedPublic

Assigned To
Authored By
Reedy
Dec 2 2016, 5:21 PM
Referenced Files
F41711785: grafik.png
Jan 23 2024, 11:53 PM
F8620908: captcha-failure-rate-cuml-2017-06.png
Jul 5 2017, 4:31 AM
F8445365: Screen Shot 2017-06-12 at 18.01.43.png
Jun 12 2017, 5:02 PM
F7149221: captcha-stats.csv
Apr 2 2017, 2:09 PM
F7149196: captcha-stats.txt
Apr 2 2017, 2:09 PM
F7149507: Screen Shot 2017-04-02 at 15.08.54.png
Apr 2 2017, 2:09 PM
F5660883: Screen Shot 2017-02-16 at 23.58.53.png
Feb 16 2017, 11:59 PM
F5576436: Screen Shot 2017-02-09 at 20.38.21.png
Feb 9 2017, 8:39 PM

Description

Be useful to have some stats on captcha breakages etc

Logging done by ConfirmEdit currently

Method
    log
Found usages  (4 usages found)
    Method call  (4 usages found)
        MediaWiki  (4 usages found)
            extensions/ConfirmEdit/SimpleCaptcha  (4 usages found)
                Captcha.php  (4 usages found)
                    SimpleCaptcha  (4 usages found)
                        passCaptcha  (3 usages found)
                            1165$this->log( "passed" );
                            1171$this->log( "bad form input" );
                            1176$this->log( "new captcha session" );
                        passCaptchaLimited  (1 usage found)
                            1123$this->log( 'User reached RateLimit, preventing action.' );

Event Timeline

FYI that analytics team doesn't have this data. We do not agreggate application data coming from mediawiki for the most part, other than edits.

There is some of this data in editing: https://edit-analysis.wmflabs.org/compare/ and I imagine some would belong to account creation.

To be clear: analytics can hold this data once created/agreggate it and preserve it if needed, but given that captchas are in limbo of ownerships right now I am not sure the precise data on this regard is being created.

There's plenty of log files on fluorine that can be trivially parsed

And it's not dependant on account creation, no. Anon users adding (many) urls can often trigger Captchas

#!/bin/bash

files=( /a/mw-log/archive/captcha.log-201*.gz )
for file in "${files[@]}"
do
        filename="${file##*/}"
        filenamenoext="${filename%.*}"
        filedate="${filenamenoext:12}"
        echo $filedate
        zgrep -Ei "ConfirmEdit\: [a-z ]+\;" "$file" | cut -d ';' -f 1 | cut -d ':' -f 5 | sort -n -r | uniq -c
done

Running atm...

$log = file( 'captcha-stats.txt' );

$lines = [];
$lines[] = "date, new captcha session, passed, bad form input";

for( $i = 0; $i < count( $log ); $i += 4 ) {
        $date = rtrim( $log[$i] );
        foreach( range( $i + 1, $i + 3 ) as $logEntry ) {
                $which = ltrim( $log[$logEntry] );
                $matches = null;
                preg_match( '/(\d+)  ([a-z]+)/', $which, $matches );
                // $$matches[2] = $matches[1];
                switch( $matches[2] ) {
                        case 'new':
                                $started = $matches[1];
                                break;
                        case 'passed':
                                $passed = $matches[1];
                                break;
                        case 'bad':
                                $bad = $matches[1];
                                break;
                }
        }
        $lines[] = "{$date}, {$started}, {$passed}, {$bad}";
}

file_put_contents( 'captcha-stats.csv', implode( $lines, "\n" ) );

So I stuck it in a google docs to make a pretty graph... https://docs.google.com/spreadsheets/d/1cJIKbu-V6IRcY_a8_SVZcxQeKeCWD_wx7NNbmJGySHI/edit?usp=sharing

I'm not really sure this shows us much... But we shall see. Might be more useful when we throw some more captchas into the mix.. And also replace some of the old ones...

Screen Shot 2017-02-09 at 19.02.36.png (1×1 px, 186 KB)

So we're missing completely out on user rate limiting error messages, due to the comma and fullstop.. Trailing . to go away in https://gerrit.wikimedia.org/r/336880

#!/bin/bash

files=( /a/mw-log/archive/captcha.log-201*.gz )
for file in "${files[@]}"
do
	filename="${file##*/}"
	filenamenoext="${filename%.*}"
	filedate="${filenamenoext:12}"
	echo $filedate
	zgrep -Ei "ConfirmEdit\: [a-z,. ]+\;" "$file" | cut -d ';' -f 1 | cut -d ':' -f 5 | sort -n -r | uniq -c
done

Seems the amount is gonna be a fraction of that of the others...

<?php

$log = file( 'captcha-stats.txt' );

$lines = [];
$lines[] = "date, new captcha session, passed, bad form input, user reached rate limit";

for( $i = 0; $i < count( $log ); ) {
        $started = 0;
        $passed = 0;
        $bad = 0;
        $user = 0;
        $date = rtrim( $log[$i] );
        $i++;
        while( $i < count( $log ) ) {
                $which = ltrim( $log[$i] );
                if ( preg_match( '/^\d{8}$/', $which ) ) {
                        // Looks like a date, next!
                        break;
                }
                $matches = null;
                preg_match( '/(\d+)  ([a-zA-Z]+)/', $which, $matches );
                switch( $matches[2] ) {
                        case 'new':
                                $started = $matches[1];
                                break;
                        case 'passed':
                                $passed = $matches[1];
                                break;
                        case 'bad':
                                $bad = $matches[1];
                                break;
                        case 'User':
                                $user = $matches[1];
                                break;
                }
                $i++;
        }
        $lines[] = "{$date}, {$started}, {$passed}, {$bad}, {$user}";
}

file_put_contents( 'captcha-stats.csv', implode( $lines, "\n" ) );

Reedy added a parent task: Restricted Task.Feb 16 2017, 9:45 PM

"F5660883 size=full"

Questions from the ignorant...

if we are getting zero reaching the limit, that means either 1) they don't retry enough, or 2) they aren't failing. How do we differentiate, do we have data on the success rate after or or two failures?

Also, "bad form input" if that is a pass captcha, can someone please explain it to me.

For clarity sake, if bots are using the API where are they captured?

Questions from the ignorant...

if we are getting zero reaching the limit, that means either 1) they don't retry enough, or 2) they aren't failing. How do we differentiate, do we have data on the success rate after or or two failures?

Also, "bad form input" if that is a pass captcha, can someone please explain it to me.

For clarity sake, if bots are using the API where are they captured?

FWIW, this isn't all the logging for Captcha stuff. Login uses different login stats, ala https://grafana.wikimedia.org/dashboard/db/authentication-metrics?panelId=7&fullscreen&from=now-7d&to=now and VE editing seems to use something else too.

For where these logging entries come from:
https://github.com/wikimedia/mediawiki-extensions-ConfirmEdit/blob/master/SimpleCaptcha/Captcha.php#L1123
https://github.com/wikimedia/mediawiki-extensions-ConfirmEdit/blob/master/SimpleCaptcha/Captcha.php#L1163-L1178

See also T157735

It wouldn't surprise me if this edit logging is in fact incomplete or even completely wrong. It's hard to tell. All the different Captcha stats in difference places doesn't help.

also... For wgRateLimits

		'badcaptcha' => [ // Bug T92376
			// Mainly for account creation by unregistered spambots.
			// A human probably gives up after a handful attempts to
			// register, but ip/newbie editing needs to be considered too.
			'ip' => [ 15, 60 ],
			'newbie' => [ 15, 60 ],
			// Mainly to catch linkspam bot edits. Account creations by users?
			// Some wikis request tons of captchas to users under 50 edits:
			// the limit needs to be higher than any human can conceivably do.
			'user' => [ 30, 60 ],
		],

IP and newbies can do 15 captchas in 60 seconds. Users 30 in 60. Maybe these are too lenient to be of any actual use?

"Bad form input" is indeed people entering Captchas wrong. Passed is they got it right. New captcha session being as it sounds

I guess we probably should start recording on what attempt (1st, through to 15th) they managed to successfully defeat a capture. In the current logging format that's a bit harder, but if we get the logging overhauled, we can pass it as a parameter etc

We also have no way of knowing if the user didn't try and captchas; ie they gave up -- Not sure how we'd do that unless with the Job Queue or similar after a certain amount of timeout

For the API, not sure. If anything, editing captchas might come into the logging here; login/signup captchas would be in the other stats on grafana. A quick glance at SimpleCaptcha/Captcha.php https://github.com/wikimedia/mediawiki-extensions-ConfirmEdit/blob/master/SimpleCaptcha/Captcha.php#L63-L69 looks like there's no logging, but not to say there isn't logging from elsewhere in the code...

Moved to mwlog1001

#!/bin/bash

files=( /srv/mw-log/archive/captcha.log-201*.gz )
for file in "${files[@]}"
do
	filename="${file##*/}"
	filenamenoext="${filename%.*}"
	filedate="${filenamenoext:12}"
	echo $filedate
	zgrep -Ei "ConfirmEdit\: [a-z,. ]+\;" "$file" | cut -d ';' -f 1 | cut -d ':' -f 5 | sort -n -r | uniq -c
done


Screen Shot 2017-04-02 at 15.08.54.png (1×1 px, 245 KB)

[graph]

thanks @Reedy. Such a low hit rate level.

I am presuming that the spike is spambot activity, and it would be interesting if @MarcoAurelio and stewards could have access to alerts to that sort of data so we could get measures to whack it. As that dirty sinusoidal pattern sort of matches editing patterns, have we looked to see where we have significant differences in the patterns. I am presuming we have an idea of the ratio of IP edits to logged edits, and that presumably has a level of stability. Also presumably we know on a per country level whether we get edits from IP or logged in, and maybe can spot some sort of difference. What sort of statistical data is available in that space?

With all the wikis in together, it's hard to distinguish between bots and humans. A more difficult captcha will certainly lead to a rise in the failure rate, but with everything in together, you can't tell whether the failures are more specific to bots than humans.

One possible solution to that is to pull out one subset of the logs which is relatively spammy, and another which is relatively human-dominated. Here are the failure rates for June, broken down by DB suffix:

SuffixPassFailFailure rate
wiki57194617054323%
wikimedia40345253%
wikiversity2191309059%
wiktionary6347905759%
wikivoyage1877268059%
wikinews2566381360%
wikisource2520397061%
wikiquote2114371164%
wikibooks54051163768%

Notable wikis with low and high failure rates:

DBPassFailFailure rate
trwiki155424113.4%
dewiki19837347414.9%
cswiki280652715.8%
enwikinews687156769.5%
enwikibooks3133817172.3%
simplewiktionary12644978.1%
miwiktionary9965186.8%

I used the aggregation script P5672

It's quite interesting to sort all wikis by their failure rate and then to plot the failure rate against cumulative count of total captcha attempts (pass plus fail):

captcha-failure-rate-cuml-2017-06.png (734×917 px, 49 KB)

You see that we have a broad plateau at 20% failure rate, presumed to be mostly humans, followed by a sharp rise, presumed to be mostly bots.

If we switched to a different CAPTCHA solution, we would want to see the height of the plateau remain the same, or be reduced. And we want to increase the slope in the bot-dominated part of the graph, around 85-100% cumulative count, so that the failure rate of spam-only wikis approaches 100%.

We should clean these up a bit so they can serve as a validation of new captcha types:

  • differentiate between registrations which have Javascript and ones which don't (seems like ~90% of spambots do not have Javascript while most user do)
  • also between ones that use the API vs. web interface, and mobile v. desktop (most spambots seem to be using desktop web)
  • store username + captcha success rate in EventLogging so that it can be merged with block logs and editcounts later to get more accurate numbers for spambot vs. productive contributor captcha error rate

then make the data easily available somewhere (grafana? ReportUpdater?)

We should clean these up a bit so they can serve as a validation of new captcha types:

  • differentiate between registrations which have Javascript and ones which don't (seems like ~90% of spambots do not have Javascript while most user do)
  • also between ones that use the API vs. web interface, and mobile v. desktop (most spambots seem to be using desktop web)
  • store username + captcha success rate in EventLogging so that it can be merged with block logs and editcounts later to get more accurate numbers for spambot vs. productive contributor captcha error rate

then make the data easily available somewhere (grafana? ReportUpdater?)

Please also don't ignore the appearance of the spambots in Special:log/spamblacklist and Special:Abuselog usually within 1 hour of creation, and say within 24 hours of creation (if we need a maximum endpoint).

Yeah, that's what I was trying to get at with the third bullet point.

  • store username + captcha success rate in EventLogging so that it can be merged with block logs and editcounts later to get more accurate numbers for spambot vs. productive contributor captcha error rate

This helps for those who get past the captcha, but what data to collect on those who fail the captcha, to assess how many of them were legit users? At a minimum, I'd suggest making sure it's possible and easy to monitor anomalous variations per language, per country and per project. If, say, Japanese users start failing much less/more than UK users, then maybe something good/bad happened for non-Latin script or non-English speaking users.

chasemp triaged this task as Medium priority.Sep 4 2018, 4:08 PM

Can someone please update the statistics? From the account creation log and the spam volume since the beginning of 2019 it seems to me that our captchas are broken.

Grafana stats haven't changed significantly since 2018. These stats are probably not very good for measuring false acceptance rates (ie. how many spambots get through), though - if only 99% of spambots fail instead of 99.9% that's 10x the workload for spamfighters but no visible change in the amount of attempts caught by captchas, and probably no visible change in successful attempts either (as those are still dominated by real humans). Maybe Tim's approach of looking at small wikis where spambots are a large fraction of the userbase would be more sensitive.

Reedy removed Reedy as the assignee of this task.Jul 10 2019, 10:32 AM
chasemp claimed this task.
chasemp subscribed.

To the best of our knowledge the work here is done, and any further work should happen in a new task Security-Team

I redid Tim's work seven years ago, the result are still basically the same.

Here is some of wikis with more than 100 captcha attempts in the last week ordered by failure ratio (top and bottom):

wikifailure ratio
zh_yuewiki2.8%
fiwiki13.0%
slwiki13.4%
dawiki13.6%
skwiki15.0%
cawiki15.2%
plwiki17.4%
elwiki17.5%
simplewiki17.8%
dewiki17.9%
itwiki18.0%
arwiki37.0%
urwiki38.1%
ptwiktionary38.9%
frwiktionary39.3%
hiwiki40.7%
metawiki40.9%
jawiktionary41.2%
enwikinews44.5%
sourceswiki47.9%
eswiktionary49.5%
frwikisource53.6%
jawikibooks56.4%
foundationwiki68.0%

Here is the graph:

grafik.png (1×1 px, 64 KB)

To be able to redo the work, you can run this DSL query in dev tools in logstash:

GET _search
{
  "query":{
    "bool":
    {"must":[],
    "filter":[
      {"query_string":{"query":"*"}},
      {"match_phrase":{"type":{"query":"mediawiki"}}},
      {"match_phrase":{"channel.keyword":"captcha"}},
      {"match_phrase":{"normalized_message.keyword":"ConfirmEdit: bad form input; {trigger}"}},
      {"range":{"@timestamp":{"gte":"2024-01-16T01:01:01.605Z","lte":"2024-01-23T01:01:01.605Z"}}}
    ]}},
  "aggs":{"2":{"terms":{"field":"wiki.keyword","order":{"_count":"desc"},"size":1000}}},"size":0
}

And send this python script: P55431

I have the raw data saved for further analysis for now.

For posterity, list of % in wikis with higher than 100 captcha attempts:

wikifailure ratio
fiwiki13.0%
plwiki17.4%
elwiki17.5%
simplewiki17.8%
dewiki17.9%
itwiki18.0%
cswiki18.9%
kowiki19.9%
jawiki21.1%
zhwiki21.4%
trwiki22.2%
enwiki22.2%
frwiki22.5%
ruwiki22.9%
idwiki22.9%
ptwiki23.8%
ukwiki23.8%
viwiki24.3%
commonswiki24.4%
svwiki25.0%
wikidatawiki25.5%
nlwiki26.1%
thwiki27.5%
mediawikiwiki27.6%
uzwiki27.9%
eswiki28.7%
hewiki31.1%
bnwiki33.8%
fawiki34.1%
enwiktionary36.9%
arwiki37.0%
hiwiki40.7%
metawiki40.9%

Noting that we deployed a new captcha basically now, in a week we look at the numbers and see how it fairs against bots and humans.