Page MenuHomePhabricator

API spamblacklist error should provide *all* blocked URLs, not just one
Closed, ResolvedPublic

Description

At the moment, it's very hard to code a bot that can filter out blacklisted URLs whilst leaving other URLs.

A simple thing to help with this would be to have the API pass back all problematic URLs on the page, thus reducing the number of attempts to two (try -> filter -> try again) rather than a loop of changing one domain every time,a s at present.

Thanks.


Version: 1.20.x
Severity: enhancement

Details

Reference
bz30332

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 21 2014, 11:49 PM
bzimport set Reference to bz30332.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

At the moment, it's very hard to code a bot that can filter out blacklisted
URLs whilst leaving other URLs.

A simple thing to help with this would be to have the API pass back all
problematic URLs on the page, thus reducing the number of attempts to two (try
-> filter -> try again) rather than a loop of changing one domain every time,a
s at present.

Thanks.

I don't actually think this is an api bug

			case EditPage::AS_SPAM_ERROR:
				$this->dieUsageMsg( array( 'spamdetected', $result['spam'] ) );

It's what gets returned back by the editpage...

Might be something looking to fix as part of bug 29246

Much as I hate gratuitous spam, I'd completely forgotten about this one. Reedy: any further thoughts? It's still a feature which would be genuinely useful, although there might be an existing pywikipedia, in which case probably less so.

Functional patch v1 (doesn't apply)

A patch (couldn't get it to apply) but at least illustrative of the solution to this problem (the changes have the desired effect, have tested). Essentially, the tradeoff is that "spammers" get the full report of what they have tried to add that they cannot, in return for a slightly longer wait. This should be of benefit to any automated tools or content adders in dubious areas, who no longer have to query and requery: they can get a single report, remove all of the offending links, and then be certain that their next attempt will succeed.

attachment 30332.patch ignored as obsolete

(In reply to comment #4)

Created attachment 9624 [details]
Functional patch v1 (doesn't apply)

A patch (couldn't get it to apply) but at least illustrative of the solution to
this problem (the changes have the desired effect, have tested). Essentially,
the tradeoff is that "spammers" get the full report of what they have tried to
add that they cannot, in return for a slightly longer wait. This should be of
benefit to any automated tools or content adders in dubious areas, who no
longer have to query and requery: they can get a single report, remove all of
the offending links, and then be certain that their next attempt will succeed.

How did you create it?

Start with a working copy of MediaWiki from svn, make the changes, then create the patch (svn diff > bug30332.diff)

attachment 30332.patch ignored as obsolete

(In reply to comment #5)

How did you create it?

Start with a working copy of MediaWiki from svn, make the changes, then create
the patch (svn diff > bug30332.diff)

Well, I was trying to create a patch against the 'installed' version of the SpamBlacklist extension. I think that was the problem (since the 'installed' version is not versioned).

First part of functional patch (for core)

attachment 30332-ext.patch ignored as obsolete

Second part of functional patch (for /extensions/)

Sorry, I have core and extensions separate, so I've had to create two patches. These ones should apply and can be tested by copying from the extensions repo into the installed extensions folder (and making sure that SpamBlacklist is included!).

Tested on local installation and works perfectly.

attachment 30332-ext.patch ignored as obsolete

Looks like you attached the same patch twice.

First part of functional patch (for core) (corrected)

So I did. Think this should be the correct one.

Attached:

Okay, so again, won't apply cleanly. On my to-do list to update (again).

Second part of functional patch (for /extensions/) (updated for bitrot)

Updated patch so it applies cleanly again

Attached:

sumanah wrote:

Jarry, there's been a bit of a delay in the review of patches here -- as we prepare to get a new version out, we're in a "code slush" during which we concentrate on reviewing code that has already been committed to our source code repository (you might have already seen the details at http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/57950 ). But we'll try to respond to your contribution soon. My apologies for the wait.

Patches resubmitted under the new system:

https://gerrit.wikimedia.org/r/3740
https://gerrit.wikimedia.org/r/3747

I hope they can both be reviewed soon.

Review and merged, closing as FIXED