Page MenuHomePhabricator

Outreachy microtask: analyze sample mouse movement data and extract feature vectors
Closed, ResolvedPublic

Description

This is a microtask for Outreachy applicants for T158909: Automatically detect spambot registration using machine learning (like invisible reCAPTCHA) .

  • Set up a Jupyter notebook to work in.
  • Clone a sample data set, https://github.com/balabit/Mouse-Dynamics-Challenge
  • Come up with several features that can be calculated from the mouse movement data, for example time between movements, movement arc curvature, speed of movement. These features will be refined later, so don't worry about choosing the perfect features.
  • Extract feature vectors for at least one of the mouse movement histories in the training data, and store as a numpy array or in any other format that can be consumed by sklearn classifiers.
  • Display a sample of the feature vectors inside the notebook, either as a table or graphically.
  • Publish to GitHub or another publicly accessible Git server.

Event Timeline

Hi @Tgr, I extracted a few features and created a table of features corresponding to a particular user from the dataset that you mentioned.
The code for the same is https://github.com/sam0410/ML-on-mouse-movements/blob/master/featureExtraction.py.
Also, the Jupyter table is :

Jupyter Data Display.jpg (586×1 px, 72 KB)

Please tell me if more changes can be made.

Looks cool but it seems like the notebook itself is not commited; GitHub can display those.

@Tgr , please have a look at github_commit. Just came up with few features. Looking forward to further guidance.

@SAM0410, @Sofmonk thanks! Entries for this microtask will be reviewed by @awight.

Hi @Tgr & @awight !
I have made a github repository which extracts a few features using the dataset provided.
Kindly have a look at it, i will be waiting for your suggestions.

Hi @awight, @Tgr!
Please take a look at my github repository. I have extracted a few features using the above mentioned dataset. I look forward to your comments and suggestions.

Hey @Tgr @awight I've tried to process some basic features for the Mouse-Dynamics-Challenge data.
Please have a look at : https://github.com/nehagup/wikimedia_microtask3
Also, I've found some interesting python packages ("Time Series Feature extraction based on scalable hypothesis tests") which contains many feature extraction methods and a robust feature selection algorithm. But couldn't apply because of the limited resources, moreover, I'll try this on cloud after some progress on task 4 and the application proposal.
Looking forward to your views!
Thanks!

Groovier unsubscribed.
Groovier subscribed.
Groovier unsubscribed.
Groovier subscribed.

@Tgr @awight , please find a link to my repo for this task here

Tgr claimed this task.