Page MenuHomePhabricator

Features for distinguishing human behaviour from bots
Closed, DeclinedPublic

Description

We aim to create a machine learning model which can tell humans apart from bots. To achieve this we need to identify key features which clearly distinguish these two classes (humans and bots). We capture data from MediaWiki account creation form which can be used as features (with or without modifications). We need to capture good features for our classifier to work well. We have already created an experiment where we captured the following data to detect outliers:

  1. Last hundred mouse coordinates the user traverses before submitting the form.
  2. The time between key presses in the Username and Email id fields in the account creation form.

Using this data we created the following features:

  1. Mean, max and standard deviation of keypress timings for each field (3 X2 = 6 features)
  2. Mean, max and standard deviation of speeds and curvatures calculated from raw mouse movement data.

(Note: This data was captured by using an experimental setup and not in actual account creation page)

Since features are the most important part of this task, we plan to use some other features as well:

  1. Dwell time (the duration that a key is pressed) and flight time (time between key release and next key press)
  2. Mouse acceleration
  3. Mouse clicks

We plan to collect data for a month and use identified bots (with the help of spam bot detection team) to generate labels for this data.