Page MenuHomePhabricator

Strip query parameters from w.wiki domain
Closed, ResolvedPublic

Description

@Krinkle pointed out that https://w.wiki/?search=X actually loads the search results from Meta-Wiki because the search query parameter overrides MediaWiki's normal request handling, and the UrlShortener redirect. I don't think anything bad can happen from loading MediaWiki on a non-MediaWiki domain, but I'd rather just not.

@BBlack, can we just strip all the query parameters in varnish? Basically just limit valid requests to / and /shortcode, and ignore everything else?

Event Timeline

Legoktm created this task.Jul 23 2016, 5:30 AM
Restricted Application added a project: Operations. · View Herald TranscriptJul 23 2016, 5:31 AM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
BBlack added a comment.EditedAug 1 2016, 9:24 PM

Yeah. As a first pass, we could simply strip URLs with s/\?.*$//. If shortcode is a pretty restricted subset, it might make more sense just go ahead and fully regex-validate that part too.

The shortcode set is 23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz$_. It's basically static and won't be changing once we enable the shortener for everyone.

BBlack added a comment.Aug 1 2016, 9:41 PM

Looks like base36 in the specs, up to 9 total characters (3x wiki id, 6x article id). We could restrict legal urls to ^/[0-9A-Z]{0,9}$. Is it always uppercase? Does the extension accept/normalize lowercase as well?

BBlack added a comment.Aug 1 2016, 9:42 PM

Simul-edit! So it's not base36?

It is [0-9A-Za-z$\-] with 0, O, I, l, and 1 removed because they're visually confusing when written down. And yes, it is case-sensitive.

Oh, and there is no max length, we'll keep generating longer ids as people keep shortening new URLs...

BBlack added a comment.Aug 1 2016, 9:45 PM

Ok I basically get it. But in the two different explanations, one has - and one has _ ...?

Oops sorry, I meant _. My bad.

BBlack added a comment.Aug 1 2016, 9:48 PM

Ok. Either way, when I count up both representations, I get 59 total characters, does that sound right?

>>> len('23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz$_')
59

Yep!

Change 302354 had a related patch set uploaded (by BBlack):
text VCL: validate w.wiki short URLs

https://gerrit.wikimedia.org/r/302354

Change 302354 merged by BBlack:
text VCL: validate w.wiki short URLs

https://gerrit.wikimedia.org/r/302354

BBlack closed this task as Resolved.Aug 1 2016, 10:55 PM
BBlack claimed this task.
Legoktm reopened this task as Open.Aug 3 2016, 6:34 AM

@BBlack this doesn't seem to be working yet? https://w.wiki/?search=X still shows search results...and https://w.wiki/IiO0o which has the non-whitelisted characters is showing the MW-generated error,

BBlack added a comment.EditedAug 3 2016, 12:45 PM

Yeah I checked as well, and it doesn't work. Most likely the simple patch is subtly broken...

Change 302752 had a related patch set uploaded (by BBlack):
Text VCL: validate w.wiki URLs (for real this time)

https://gerrit.wikimedia.org/r/302752

Change 302752 merged by BBlack:
Text VCL: validate w.wiki URLs (for real this time)

https://gerrit.wikimedia.org/r/302752

Legoktm closed this task as Resolved.Aug 3 2016, 7:12 PM

Thanks! It works now :)