Page MenuHomePhabricator

Wrap-up report for "Extension to identify and delete spam pages"
Closed, ResolvedPublic

Description

Fields required in a wrap-up report:

Project description

SmiteSpam is a MediaWiki extension (a piece of software that adds features to the core MediaWiki software) that helps wiki administrators identify and delete spam pages.

Spam pages

Because wikis are openly editable, they make great targets for spammers. Product advertisements to absolute garbage, any kind of spam turns up on wikis.

While accurate detection of a spam wiki page is an open problem in the field of computer science, this extension tries to detect possible spam using some simple checks: How frequently are external links occurring in the text? Are any of the external links repeating? How much wikitext is present on the page?

Results

Due to this oversimplified approach to identifying spam pages, the results of the extension are far from perfect. However, the extension does a reasonably good job of finding particularly bad pages in a wiki and presents them to the admin.

For example, running one result page of Special:SmiteSpam on Discourse DB gives the following results:

You can see a list of pages, their creators, how confident SmiteSpam is of them being spam, the creation time of the page and options to delete the page and/or block the creator.

Trusted users

Since SmiteSpam is far for perfect, several false positives turn up in the results. One easy way to deal with them is to mark a user as "trusted". Pages created by trusted users are ignored by SmiteSpam and will hence reduce the number of false positives in the results.

The trusted users can be viewed and edited at Special:SmiteSpamTrustedUsers:

Missing features/known bugs

The MediaWiki-extensions-SmiteSpam workboard contains all SmiteSpam-related tasks.

Some missing features include:

Some useful enhancements include:

Links