This is a GSOC Project Proposal For the Idea: Extension to identify and delete spam pages
Name: Arindam Padhy
irc nick: d3m3nt3r
Time Zone: UTC+5:30
Typical working hours: 9PM to 3AM before 23rd April, 3PM to 6PM after 23rd April (Indian Standard Time)
There are quite a few MediaWiki extensions to prevent spam, and some extensions that let you delete pages en masse.
What MediaWiki doesn't have yet is a capability to deal well with spam that's already in place on the wiki.
The Nuke extension lets you do a mass deletion on all pages created by a single user or IP address, but that's not too helpful because spammers tend to switch quickly from one user/IP address to another,
perhaps to get around such tools.
At present the spam detection system is not that efficient as it does not avoid certain spams but still after the addition of rel="nofollow" there have been a slight reduction in spams but still there is a lot of
types of spams to be dealt with.My project will try to deal with it.
Mentor: [[ https://phabricator.wikimedia.org/p/Yaron_Koren/ | name ]]
Co-Mentor: [[ https://phabricator.wikimedia.org/p/jan/ | name ]]
My solution will be that I will build the extension in two phases.
1.Different Spam detection
2.Deletion of the Spam
Before 27th april: Request to mediawiki people for a gerrit repository for the extension. Setting up of basic design of the solution
27th April to 25th May: Interaction witth th community members , implementing any changes /improvements to the solution of the idea.
25th May: Official Coding Begins for GSOC
25th May to 4th June: Building an algorithm for Detecting spams on the page
5th June to 6th June : Coding of the algorithm
7th June to 25th June : Work on the algorithm to deal with the different types of spams and to delete them
26th June to 28th June : Cleaning up of code and Finalizing minute details before test run
29th June to 1st July : Testing up of Extension on different browsers ,whether the code is running properly on each browser or not.
1st July to 11th July : Implementing different spams on the extension by the mediawiki members to check whether the code is efficient enough to deal with all spams.
12th July to 15th July : Changes made to the extension if any required after the implementation
16th July: Wrapping up of all the parts of the extension
17th July to 19th July: Testing and Removing of any further bugs if found
20th July : Improving Documentationand finalizing Code
My project will be done in two parts as I had mentioned earlier
**Part 1**:To identify spams from the webpage
This part will mainly deal with finding of available spams on the webpage.Spams can be hidden ,lead the user to a infective site,can open up number of browsers at a time etc
Problem arises on how to detect them because all spams cannot be removed with the same technique.different spams behave differently.
Using PHP and Java script I will be able to find all the possible spams
Whenever a user requests for the web page and he is redirected to another then he is said to be facing spam issue.This can be avoided as once the user clicks on the link ,i will keep a variable that will
store the url of that site which user wanted .This variable will be checked with the url of the site which is going to be showed to the user.As soon as the both values don't match then it means that user was
being redirected to spam page.After this occurs the url of the original site will be reloaded.but as again the spam will continue to cause problem hence as soon as we encountered that its a spam site then
we wll block that site redirection.
In this way the user will be saved from spam technique called cloaking.
Another possible spam can be advertisements popping out each time a site is opened or whenever user clicks on the site, this can e avoided by placing a counter variable.The page source code would be
checked before passing to the user system an if any advertiement keyword or any pop out code is present then that particular code would be deleted and the page will be reloaded.Now comes how to deal
with opening of different sites on different tabs whenever user clicks on the page.This can be dealt by checking on the source code for the keyword "on click" with on click if any url is found thn that
particular url would be deleted from the source code and the web page would be forwaded.
All this spams will be deleted means they will be transfered into a php file at the server side, which will automatically be cleaned in a particular interval of time.That will be set by the mediawiki members.
After deletion the new Page source will be replaced with the Page source at the server.Hence the page will become spam free.
**Part 2**: This mainly deals with the development of algorithm to delete th spam from the pages.As i have already mentioned about the deletion of the spam.
Actually first of all the user requests for the web page, then the server transfers the web page,that is the source code.My extension will run on the server side once,if requested by my mentor i can develop
it in such a way that it runs on both client as well as sever side.On the server side first spam will be filtered then the spam free page will be transfered to the user by passing through the above mentioned
conditions.By this way all possible spams will be filtered and the page will be spam free.
The coding will completely be done using PHP and Java script. Deletion procedure will be made efficient and will not make the page loading slower.Apart from these if any further types of spams that are
possible and will not be detected through this extension ,i will have kept a spam testing time period in my time line where i can fix many errors and make changes on the code to make it possibly deal with
all spams so that user doesn't face any problem.
During the entire development i would like to receive help from my mentor Sir Yaron Koren and medaiwiki members in testing and finalizing my extension.
Source code will be pushed on a gerrit repository as soon as I get one
I'm a Arindam Padhy second year undergraduate student of Computer Science branch at INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY (IIIT) ,BHUBANESWAR , INDIA
I have done a few networking projects by using php.
I have a huge interest in dealing with malicious things like malware,viruses,spams and my major interest is always in network security.I had already taken training under Hewlett-Packard officials last
summer on Network Management and Security after which i was certified by them.
Apart from this I have been involved in making websites secure by dealing with all the posible security issues.
I have designed Websites for my school college festivals.
How did you hear about the program?
I heard about GSOC from my friends.
Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?
By 23rd april my 2nd year final examination will begin an it will last till 10th may.During that time I have to be focussed on my exam.After that my summer vacations will begin and i have 3months summer
vacation ending at august.As soon as my summer vacation begins i will be able to give ful commitment to my project and i assure you to follow my timeline strictly without any deliberate delays.
We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what
No, i would only like to apply for GSOC 2015.
What does making this project happen mean to you?
It means a lot to me,firstly it is related to something which I always wanted to do.This will further lead me to gai experience and know a lot of stuff about media wiki and how it works.
How would you like to contribute to mediawiki after GSOC 2015?
Even after the end of GSOC 2015 I would like to contribute to media wiki in all posssible manners that I will be useful for, Specially on the Security issues which i consider as my strength because of which i
am applying for this project.
This would be my first experience with media wiki, but i had a few previous experiences with phpmyadmin where i had tried on the project for user iterface development.I had already begun my work on
that but unfortunately I was not selected.
But still i finished my work and implemented the patch on my machine.My work was basically a work on server variables.
As i had already mentioned my major interest is security and malware testing.In coming future will be trying for getting a certificate for doing a project by CISCO.
I have started using mediawiki since last year and have been planning to work on this project as since then.
Projects that I have worked on:
1.Security issues on Linux systems
2.Worked on the security of the open source academic information system of my college know as hibiscus
[[ URL | =https://hib.iiit-bh.ac.in/Hibiscus/Login/?client=iiit ]]