Page MenuHomePhabricator

Directory traversal allows single-page whitelisting to override entire spam-blacklist entry
Closed, ResolvedPublicSecurity

Description

On test.wikipedia.org, \bbadsite\.com\b is blacklisted. I added a whitelist entry for \bbadsite\.com/page\b, the standard format for a whitelist entry on Wikimedia sites. Whitelisting worked as intended, however I was also able to add the URL https://badsite.com/page/../badpage. As can be seen with these real-world examples in my enwiki sandbox, I was able to use this exploit to link to real sales pages on Kickstarter and Ticketmaster, which are both blacklisted on enwiki.

URL parameters are also a vulnerability. For instance, on MediaWiki sites, ?diff= and ?oldid= override the title, so if some wiki were blacklisted but its Main Page were whitelisted, any page can be linked to with Main_Page?oldid=....

It's very rare that a trailing slash or an anchor will lead to a new page, so the general solution here is to replace trailing \bs with /?(?:\x23\S*)?$ (\x23 is escaped #). Results of implementing that on testwiki:

I could just implement this myself on wikis where I have sysop, but I'm bringing it to a security ticket first because of the global implications. These could be changed en masse across Wikimedia wikis by a global interface editor, maybe with an entry in Tech News explaining why the change was made, so admins will know to stop using \b at the end of whitelist entries.

P.S. I haven't revdelled/logdelled any of my tests on enwiki or testwiki, because it seems easier for the ticket if everyone can see everything, and I don't think there's a massive risk to people finding out about this as long as we do something about it relatively soon. But I can delete stuff if that's deemed preferable.

Details

Risk Rating
High
Author Affiliation
Wikimedia Communities
Related Changes in Gerrit:

Event Timeline

sbassett changed Risk Rating from N/A to High.

These could be changed en masse across Wikimedia wikis by a global interface editor, maybe with an entry in Tech News explaining why the change was made, so admins will know to stop using \b at the end of whitelist entries.

Sounds like this would be the simplest approach for now? ext:SpamBlacklist is a rather dated extension with no formally-documented maintainers and an open code-stewardship review (T224921). So addressing potential bugs that might be causing this behavior probably isn't the most realistic path forward at this time. I'm wondering if we should just sub several global int-admins (many of whom are WMF employees) on this task and further coordinate this effort here...

I want to be clear, I hesitate to even call this a bug or a misconfiguration on the extension's end; it's just counterintuitive behavior for end users. With https://badsite.com/page/../../badpage, for instance, the extension is correctly saying "This does match \bbadsite\.com\b, so it should normally be blocked, but it also matches \bbadsite\.com/page\b, so it should be let through". That's all valid parsing of the regexen. The problem lies in how URLs actually work (or, tend to work on most websites), which is beyond the purview of a simple regex-based extension. (If this were, say, a profanity-matching extension, it would be working perfectly.)

Given how it seems everyone was implicitly assuming that the whitelist was doing full-string matching, then with a time machine the solution might have been to make the extension do that from the beginning... which I guess could be changed now but would, as you say, probably be a much greater hassle, and could lead to problems down the line with the converse assumption. (That is to say, either approach requires site admins to change how we write the regexen, but a change to the extension adds dev hours too.)

So yeah, I'll subscribe a few GIEs whose names I recognize. Feel free to unsub if y'all're busy. :)

Oh, I will just float one third option, which I think is more of a Band-Aid but is worth raising, which is some kind of "blacklist even if whitelisted" for <whitelist match>.*/../../ and <whitelist match>.*?.+= (mixed regex + pseudocode).

Oh, postscript to my original post: Is the \S in my proposed regex necessary, or does the extension implicitly stop looking for a match when it hits whitespace? If the latter, a simple . would work.

It's very rare that a trailing slash or an anchor will lead to a new page, so the general solution here is to replace trailing \bs with /?(?:\x23\S*)?$ (\x23 is escaped #).

This looks correct to me, but note that it's only correct if the intention was to allow just that specific URL (with optional trailing slash and fragment), and not any of its subpages.

If you wanted to allow subpages, but disallow .. segments, that would also be possible using a regex like this: \bbadsite\.com/page((?!/\.\.).)*$ (demo: https://regex101.com/r/TJ8IS1/1)

But, I wouldn't bother doing either of these solutions, unless we actually find that someone is abusing this bug right now – both of these regex approaches are really unreadable. Maybe we can solve this better in the code, see below.

Oh, postscript to my original post: Is the \S in my proposed regex necessary, or does the extension implicitly stop looking for a match when it hits whitespace? If the latter, a simple . would work.

A . is enough, as URLs can't contain spaces.


I think the real problem here is that we allow .. segments in the URLs passed to SpamBlacklist (and stored in the externallinks table, shown on Special:LinkSearch, etc.). They should probably be normalized out before that point.

The URL standard (used by web browsers, but not necessarily every tool) already requires that – any such segments are removed during parsing and not representable (https://url.spec.whatwg.org/#ref-for-double-dot-path-segment②), so a browser will never send a request to https://badsite.com/page/../BadPage, instead normalizing it to https://badsite.com/BadPage client-side.

I think it would be perfectly reasonable for MediaWiki to do the same, and then the anti-spam regex authors would not have to worry about this.

That makes a lot of sense, @matmarex, thanks. What about the URL parameters angle, though? To give a real example that seems ripe for exploitation, on enwiki the neo-Nazi site Metapedia is blacklisted, but \ben\.metapedia\.org/wiki/Main_Page\b is whitelisted. That allows a link like https://en.metapedia.org/wiki/Main_Page?oldid=479369, which in fact would lead readers to Holocaust denial.

You could use $ at the end, instead of \b, to allow only that specific URL (no parameters, subpages or links to fragments), or (?:\x23.*)?$ to allow only links to fragments.

I think the real problem here is that we allow .. segments in the URLs passed to SpamBlacklist (and stored in the externallinks table, shown on Special:LinkSearch, etc.). They should probably be normalized out before that point.

The URL standard (used by web browsers, but not necessarily every tool) already requires that – any such segments are removed during parsing and not representable (https://url.spec.whatwg.org/#ref-for-double-dot-path-segment②), so a browser will never send a request to https://badsite.com/page/../BadPage, instead normalizing it to https://badsite.com/BadPage client-side.

I think it would be perfectly reasonable for MediaWiki to do the same, and then the anti-spam regex authors would not have to worry about this.

Patch implementing this:

I'm sending it here since the task was filed as a security bug, but I think this could be submitted publicly to Gerrit.

Patch implementing this:

I'm sending it here since the task was filed as a security bug, but I think this could be submitted publicly to Gerrit.

CR+1 for the patch. But I agree this is low-risk enough to just go through gerrit as a code hardening patch and would likely benefit from getting a few more eyes on it from folks on the Content-Transform-Team, etc.

sbassett changed the task status from Open to In Progress.Dec 13 2023, 10:35 PM
sbassett triaged this task as Medium priority.

I don’t mind the patching, but first I have not seen this being done
on-wiki. It would be only applying to the cases where we whitelist a
suitable path (unless it also works on specific documents? Does this work
on badsite.com/path/goodtxt2read.pdf?), which does not seem giving
malicious editors a lot of chance to apply this (you need a suitable
whitelisted link).

Secondly, doing this means that the user knows that the domain they want to
use is blacklisted, and that other documents on the site are whitelisted,
and that this trick circumvents those admin decisions. If that is the
case, that would be for me, as an admin, enough bad intent to issue a
direct, indef block on the account that used this circumvention.

Just out of curiosity, can we query how many links have the \.. construct,
and how many of those are on blacklisted sites?

@Beetstra The ../ vector works on all websites, if I'm understanding T352827#9391030 correctly, while URL-parameter-based tricks like ?realPage=BadPage will depend on the website in question. The way it seems to me, someone whitelisting a page or path is expecting to whitelist just that page or path. But the use of \b—which makes sense on the blacklist, where we want to match the whole site—has the opposite effect on the whitelist. So most of these URLs should have been using $ to begin with.

So the ../ problem is being fixed MW-side. An on-wiki fix to the URL parameter problem would be to change terminal \bs to $s; for cases where a whole path is allowed, I think it would depend on the site and what exactly needs to be whitelisted... and there, yeah, it might make more sense to just wait and see if anyone comes up with exploits. But I see no reason not to do the \b$ change globally; this will fix most of the issues.

In other words, this is a problem we can fix by throwing $$$$ at it ;)

Change 987503 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Parser: Normalize dot segments in URL paths

https://gerrit.wikimedia.org/r/987503

People should be back to work after the holidays now, so I submitted the patch for review.

Just out of curiosity, can we query how many links have the \.. construct,
and how many of those are on blacklisted sites?

You could list them with a SQL query like this:

select * from externallinks where el_to_path like '%/../%' or el_to_path like '%/..';

(this will return some false positives when the dots appear in a query parameter or hash, and will probably take a long time to run on large wikis)

The patch looks good to me, and I assume that these kinds of urls are rare enough that there are no real implications on the externallinks table, i.e. "$PREFIX/foo/../bar" and "$PREFIX/bar" should ideally been the same entry in that table, but will now be a new entry for every such link on every page reparse, but since we think this is likely very rare usage, I think we don't really need to worry about the potential for duplicates. I am going to +2 that patch, but just noting this here as an FYI.

Change 987503 merged by jenkins-bot:

[mediawiki/core@master] Parser: Normalize dot segments in URL paths

https://gerrit.wikimedia.org/r/987503

Just in case anyone's curious, I ran the query to find all potentially affected links across all wikis. In addition to /.., I also included /. segments, since they could also be used to fool anti-spam rules. There are 9326 results. I didn't try to match them against filters, but all of those I checked out looked harmless.

matmarex claimed this task.

(This was resolved by that patch, I just forgot to close it. @sbassett Could you please make it public?)

(This was resolved by that patch, I just forgot to close it. @sbassett Could you please make it public?)

Done.

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".Feb 15 2024, 8:41 PM
sbassett changed the edit policy from "Custom Policy" to "All Users".
sbassett moved this task from Watching to Our Part Is Done on the Security-Team board.