Outreachy T158909 (AI CAPTCHA) questions

Tgr changed the edit policy from "Room Participants" to "Custom Policy".
Tgr changed the visibility from "Custom Policy" to "Public (No Login Required)".
Tgr changed the edit policy from "Custom Policy" to "All Users".
Tgr renamed this room from Outreachy T158909 to Outreachy T158909 (AI CAPTCHA) questions.

@awight: I added some microtasks to T158909. If you could add something on the machine learning side, that would be cool.

Dristibutola joined the room.

I wanted to share an idea about generating user info to be safe or not on level 1 , by embedding code in the site and creating the algorithm from that data, to calculate the trajectory of the mouse pointer till the event of actual click on the recapta.

we probably don't want to collect enough data to reconstruct mouse movements - e.g. someone could be using a virtual keyboard browser plugin and the coordinates could capture what the password is

but sampling the mouse velocity or acceleration is probably a good start

then at some point we have to figure out what to do about people using keyboard for navigation, or screen readers

(just giving those people a normal captcha wouldn't be tragic but maybe there's a better approach)

If we use invisible recaptcha , then we can use just a button like "Not a robot", then identification will involve :

  1. recording the time delta, both very short and long interval are a sign of alert.
  2. I believe, recording the mouse coordinates, will be highly helpful because automating tools or spam bot generally depend on html and js for identifying the clicking events which happens in a direct way without any physical pointer changes .

Yeah, simple spam bots are easy to detect because they do not simulate mouse/keyboard events at all. There are two ways an attacker could try to break such a captcha:

  • make the bot use a headless browser (Selenium, PhantomJS etc) and try to simulate human-like mouse movements (libraries exist)
  • do not simulate a browser at all, just send web requests directly. In that case the data collection script never gets executed and the attacker pretends to be the script and chooses what values to report. They could just capture data from a real human and resubmit that - so the eventual captcha logic will probably need some sort of protection against duplicated data.
Sep 9th, 2017
Kamsuri5 joined the room.
Sep 17th, 2017
Sofmonk joined the room.
Sep 18th, 2017
Sep 20th, 2017
Smarita joined the room.
Sep 24th, 2017
Nehagup joined the room.
Sep 26th, 2017
SAM0410 joined the room.

Event logging is done on a schema that is a namespace! I agree!
But how do we create a namespace on mediawiki local environment? (That's confusing me)

Hey everyone! I've created the json file which has the properties and attributes but that needs to be put under a namespace: Schema.
I read the documentation thoroughly couldn't find any guide for developers to create a namespace.
Can anyone please guide me on that?

@Nehagup: (non-default) namespaces are managed by extensions. If the EventLogging extension is installed, the Schema namespace extists. You create a page in a namespace the same way you create any other page, just make sure the page name starts with Schema:.

I have installed the EventLogging extension as well. But when I hit ( localhost:8080/Schema or Schema:pageName ) it says "There is currently no text in this page. You can search for this page title in other pages, search the related logs, or create this page."

And when I created a json file there, it appeared like this on saving.( I was expecting it to be in a tabular. )

I'm i'm missing something!


I found the solution it was there in the LocalSettings.php file.

@Nehagup that's not actually in the Schema namespace. Pages in that namespace are prefixed with Schema:. So Schema:Schema is in the Schema namespace, but Schema is in the default namespace.

In practice you'd want something descriptive like Schema:CaptchaMouseDynamics.

Sep 27th, 2017

Hi, My page has name Schema:EventLogging. I typed in my json file. Even then I am not able to get the page in the desired format. Can you please help me to find out what is missing. Thanks!

Hi @Nehagup , Can you please tell me what changes you made in LocalSettings.php file?

Sure. I followed the EventLogging installation guide!
I made mistake there only.

@SAM0410 that seems like EventLogging is not properly installed. The easy way to check is to visit the page Special:Version in your wiki, which will list all enabled extensions.

Thanks for your replies @ Nehagup and @Tgr . I think EventLogging is installed. In the event logging installation guide, it is mentioned that to use the Schema namespace on the local wiki, use

$wgEventLoggingSchemaApiUri = $wgServer . '/w/api.php';
$wgEventLoggingDBname = $wgDBname;

But I am not sure where to add these lines. I tried adding in LocalSettings.php but the output is still not right. Can you please guide me on this?

Can you check the content model of the page? There is a "Page information" link at the left sidebar. The content models should be JsonSchema, not wikitext.

If it's wrong then maybe you created the page before installing the extension

in which case the easiest way is to just delete and recreate it

Thankyou so much @Tgr ! It worked :)

Sep 29th, 2017

I started for the microtask but I get the following, at ' localhost:8080 ', after ' vagrant up ' :
No wiki found
Sorry, we were not able to work out what wiki you were trying to view. Please specify a valid Host header.
Available wikis:
and after ' vagrant provision ' now it is saying
Service Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

I have already tried re-installing, and solution on T116500.

Any help is appreciated. I have been struggling with this for 2 days.


Hi @Tgr , I am trying to log events via the console of my local wiki- so that I can see it on eventlogging devserver. But when I try writing
mw.eventLog.logEvent('EventLogging',{UserName:"sam0410",IpAddress:"123456" }); (here EventLogging is the name of my local schema page)
I get an error
mw.eventLog.logEvent is not a function . I read the event logging guide but couldn't figure out how to solve this. Please help.

@Sofmonk usually the easiest way is to just redo the installation from scratch (not the Vagrant software itself, that one is almost never the problem, but the virtual machine). To do that, run vagrant destroy then delete and recreate the vagrant directory. If that didn't work, or you already tried it, or prefer debugging instead, then please file a bug report that includes your host OS, the output of vagrant provision (ideally the output from the initial vagrant up as well, but that's hard to get after the fact) and the last few lines of the relevant logfiles in vagrant's log subdirectory (apache2.log, hhvm/error.log, mediawiki-exception.log and mediawiki-wiki-debug.log, probably - checking the last modified timestamp helps to see which of those contains something useful).
Debugging tends to involve many questions and answers which can be very time-consuming when communicating asynchronously, so you might want to find someone on IRC to help (#wikimedia-cloud is a good place for Vagrant issues, and #wikimedia-dev is a good place for everything). Feel free to ping me as well if you see me online.

Sep 30th, 2017

@Tgr Any idea as to why this https://gerrit.wikimedia.org/r/#/c/380466/ fails in CI? As per the logs, the bot was unable to fetch the remote repository in which case a manual re-run of the build should do the job right?

@SAM0410 that means the EventLogging Javascript module is not loaded. I'll fix the documentation to explain that part better.

@Smarita yeah, it happens occasinally, let me re-run.

MarcoAurelio joined the room.

Thanks @Tgr. Also,The Python setup instructions link for EventLogging isn't available ( link present in this page ). Can you please provide me with the link?

@SAM0410 probably meant to link https://github.com/wikimedia/eventlogging/blob/master/README.md , fixed. You can also use Vagrant which does all the setup automatically.

Oct 1st, 2017

@Tgr Addressed your comments. Kindly check this out https://gerrit.wikimedia.org/r/#/c/381627/ :)

Hi @Tgr , I've been having a hard time understanding the logging mechanism in mediawiki. For client side logging: I've been told that mw.track works with mw.trackSubscribe, but how exactly? where do i see errors? (I've enabled debugging mode, but don't see errors). I'd really appreciate some examples on client side logging. For server side logging: I understand I need to define hooks, and those hooks will execute some code which I need to define and I

have created a separate module, and so where exactly should the php function go (which will log the data into the database using event-logging)?

Oct 2nd, 2017

@Nehagup: mw.track/mw.trackSubscribe is just a generic code decoupling mechanism: you put in data into mw.track and it comes out in the mw.trackSubscribe callback. It's a way to communicate between two parts of the code which are not connected and do not need to know about each other.
In the background it's a pretty simple thing: when you call mw.trackSubscribe('eventName', callback(){}), it will put callback into the array of callbacks associated with eventName, and when you call mw.track('eventName', data) it will call all those callbacks with the data. Internally it's implemented using jQuery.Callbacks, so the exact details of error handling depend on what jQuery version you use, but errors should reach the browser's error log / debug console one way or another.
A simple example is mediawiki.errorLogger.js pushing error data to the global.error channel and Sentry (when installed) processing it.
In the case of EventLogging events, the extension will automatically subscribe to certain channels as you can see in ext.eventLogging.subscriber.js.
In the case of backend logging, the UserSaveOptions hook in WikimediaEvents is a simple example. This hook gets called every time a user changes their preferences. For the captcha project we need to log mouse/keyboard dynamics data so backend logging is not really useful there.

@Tgr Thanks a lot for the detailed answer! This exactly what i needed. I understand the client logging well now, and there are somethings I don't understand with the backend logging. I'll ask them after doing some trials :)

Oct 3rd, 2017

I was following steps on Extension:EventLogging#Configuring_the_schema_location but I don't get where am I suppose to place the following code

$wgEventLoggingSchemaApiUri = $wgServer . '/w/api.php';
$wgEventLoggingDBname = $wgDBname;

Any help is appreciated.

@Sofmonk see the previous section of that document. Also MediaWiki-Vagrant#MediaWiki_settings if you are using Vagrant.

Hi @Tgr , mw.loader.load( 'ext.eventLogging' ); in my console (to load core module) says undefined . I am not able to figure out what I am missing. Please help

@SAM0410 that just means the function call did not return anything (which is normal). If you didn't get any error message, loading the module probably succeeded.
That said, mw.loader.load is not really meant to be used directly. Is it mentioned in the documentation somewhere? mw.loader.using is the normal way of loading modules.

Oct 4th, 2017
Sagorika1996 joined the room.
Oct 7th, 2017

I submitted code for task 1 at https://gerrit.wikimedia.org/r/#/c/382842/. Please review. Thanks !

hm, that link seems incorrect

in general our EventLogging documentation is not in a good shape :(

fixed it now

Oct 8th, 2017

@Smarita re: your IRC question, not sure what's up with the red button, but it's just a convenience

if the schema exists and looks like a schema (ie. a table, not just plain text) it probably works

the version number is just the page revision id (you could check it in the page history, but you can see it under the title as well)

if it doesn't look like a schema, but the extension is installed, you somehow ended up with the wrong content model

Hi @Tgr, I have been trying to record user IP in my javascript file. I understand it is $ip = $session->getRequest()->getIP(); in php.
I read mw docs and couldn't find out how to add user IP. Please help.

@SAM0410 there is no easy way to get the IP in Javascript, but it's not usually needed either; you can just record it on server side when you send back the rest of the data

Oct 9th, 2017
Groovier joined the room.
Oct 11th, 2017

Hi @Tgr , I uploaded code for task 2, keeping in mind the changes you suggested in task 1 https://gerrit.wikimedia.org/r/#/c/383714/ .

Oct 13th, 2017

@SAM0410 responded there.

Oct 20th, 2017

Hi @Tgr !
I am trying to create EventLogging schema, but it says i don't have permission to create that page. As per my research user needs to be an autoconfirmed user to create a schema. I have also tried after adding $wgGroupPermissions['autoconfirmed']['createpage'] = true; but it still gives the same error. Can you guide me on this?

Oct 21st, 2017

@Kamsuri5 On which wiki are you trying to do that? Meta-Wiki? Thanks.

@Kamsuri5 autoconfirmed already has the createpage right (even anonymous users have it in the default Vagrant setup). Is the user you are using actually autoconfirmed?

Oct 22nd, 2017

@MarcoAurelio I am working on dev wiki.
No, the user i am using don't have autoconfirmed access.

@Tgr I don't have the local copy from which i submitted the earlier patch. I searched alot but haven't found a way to update the earlier patch submitted from another local copy. Can you tell me how can i do this or should i submit a new patch?

Referrring to your first question, @Kamsuri5 That's probably because you haven't logged in/created an account yet. Create a user and login and let us know if it works for you then.

For your 2nd issue, you may refer to this link. It contains one such case. The other option would be for you to use the Gerrit UI and clone it from there (haven't tried this second method though, although I guess it should work just as well). Hope this helps :)

@Kamsuri5 I challenged the same issue. Got through it by logging into the admin account of local dev-wiki. Because the other accounts created are not auto-confirmed by default.

And I'm also looking up for the 2nd issue you raised. Because due to some bug I had to set up the whole environment again and I'm not able to switch to the same branch(local env of my gerrit account) again. Please do share if you get through it!

Thanks @Smarita and @Nehagup, i completely forgot about the admin account.

I will go through the link given by @Smarita and will let you know @Nehagup .

Does anyone know a solution for this "There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Go back to the previous page, reload that page and then try again. "
I don't know from where it popped up suddenly and i am not able to login now.

During what session are you facing this message?

@Kamsuri5 does that happen repeatedly? Usually it means your browser is not handling cookies properly (unlikely) or the cache backend the wiki is using loses data (unlikely if you are using vagrant).

Re: git, you can use git reset to change what commit git thinks you are working on without changing any of your files. If you go to the gerrit page for the changeset and click "Download" on the top right, you can find the command for checking out the patch; you can just replace checkout with reset in that command to prevent git from changing your files. After that you can use amend etc.

@Tgr Earlier it was working fine but now it's not letting me login, everytime this error pops up. I have tried in Chrome and firefox both, same error is showing up. I am using vagrant only. Any fix for this?
Thanks for update patch-set solution.

@Kamsuri5 you can try clearing the sesssion backend with redis-cli FLUSHALL. Or restart the VM (vagrant reload) and see if that helps. Or delete all browser cookies for the vagrant domain (although if it happens with multiple browsers that's unlikely to be the issue).

Oct 24th, 2017

I have tried both, clearing session backend and vagrant reload but still encountering same error.

@Kamsuri5 I've written up some notes about debugging session issues at https://www.mediawiki.org/wiki/Manual:How_to_debug/Login_problems
It tends to be a lot of effort so just reinstalling your vagrant box might be easier :(

Oct 28th, 2017

I will reinstall again