Spam blacklist ineffective on encoded URLs inside file inclusion syntax's link parameter
Closed, ResolvedPublic

Description

Spam blacklist entries:


\bwikipediaforum\.com\b

\bwikipediocracy\.com\b

File inclusion syntax that makes a working disallowed link:


[[File:Symbol wtf vote.svg|20px|link=http://wikipediocracy%2Ecom]]


Version: unspecified
Severity: normal

Details

Reference
bz46143
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz46143.
bzimport changed Security from none to Software security bug.
Restricted Application changed the visibility from "Public (No Login Required)" to "Security (Project)". · View Herald TranscriptNov 22 2014, 1:18 AM
Restricted Application changed the edit policy from "All Users" to "Security (Project)". · View Herald Transcript

Thanks MZMcBride, it looks like Chrome and Opera treat a %2E in the hostname of the url as a valid . character.

The quick fix is just to replace %2e characters before matching, like the fix for bug 12896. But it may be good to do more extensive normalization in the near future. Maybe something like:

function normalizeUrl( $url ) {
$bits = wfParseUrl( strtolower( Sanitizer::cleanUrl( $url ) ) );
$bits['host'] = str_replace( array( '%2e', '.'), '.', $bits['host'] );
$bits['path'] = str_replace( array( '%2e', '.'), '.', $bits['path'] );
return wfAssembleUrl( $bits );
}

Created attachment 11941
Decode %2E before matching blacklist

Attached:

(In reply to comment #1)

The quick fix is just to replace %2e characters before matching, like the fix
for bug 12896. But it may be good to do more extensive normalization in the
near future.

Hmmm, this is something to do with file inclusion syntax's link parameter specifically, I think.

If I attempt to insert "http://badsite%2Ecom" or "[http://badsite%2Ecom hello]" or "[[File:Symbol wtf vote.svg|10px|link=http://badsite.com]]" into a wiki page, it appropriately fails if the domain is blacklisted.

However, "[[File:Symbol wtf vote.svg|10px|link=http://badsite%2Ecom]]" works.

URL unquoting (i.e., changing "%2E" to ".", etc.) is already being done in some cases for comparison against the spam blacklist, so the question is why this particular syntax is able to be saved without being flagged as blacklisted.

Nice catch. Before most links are added to the list of external links, they're passed through Parser::replaceUnusualEscapes first. Image links are not at the moment.

Created attachment 11942
Filter urls with replaceUnusualEscapes before adding to external links list

This would be instead of patching SpamBlacklist, so that any other uses of the external links list will have consistent url normalization.

Attached:

Restricted Application changed the visibility from "Security (Project)" to "Custom Policy". · View Herald TranscriptNov 24 2014, 9:28 PM
Restricted Application changed the edit policy from "Security (Project)" to "Custom Policy". · View Herald Transcript
Restricted Application changed the visibility from "Custom Policy" to "Custom Policy". · View Herald TranscriptDec 30 2014, 9:45 PM
Restricted Application changed the edit policy from "Custom Policy" to "Custom Policy". · View Herald Transcript
Krenair added a subscriber: Krenair.
demon removed a subscriber: demon.Aug 19 2015, 4:41 PM
Bawolff added a subscriber: Bawolff.

I think it'd be better to move the normalization check directly into ParserOutput::addExternalLink. You should never not normalize the link, and doing it separately seems to be inviting people to forget.

To that end:

Also, a similar issue is present in <gallery> tags, which the above patch fixes.

This comment was removed by Reedy.
This comment was removed by Reedy.

Reedy assigned this task to Bawolff.Mar 31 2017, 12:17 AM
Reedy closed this task as Resolved.Apr 1 2017, 4:35 PM

Closing for ease of tracking progress. Patches attached to parent bug, due for next release

Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".Apr 6 2017, 8:57 PM
Reedy changed the edit policy from "Custom Policy" to "All Users".

Change 346846 merged by jenkins-bot:
[mediawiki/core@master] SECURITY: Always normalize link url before adding to ParserOutput

https://gerrit.wikimedia.org/r/346846

Change 346865 merged by jenkins-bot:
[mediawiki/core@REL1_28] SECURITY: Always normalize link url before adding to ParserOutput

https://gerrit.wikimedia.org/r/346865

Change 346855 merged by jenkins-bot:
[mediawiki/core@REL1_27] SECURITY: Always normalize link url before adding to ParserOutput

https://gerrit.wikimedia.org/r/346855