Page MenuHomePhabricator

Strip query parameters from w.wiki domain
Closed, ResolvedPublic

Description

@Krinkle pointed out that https://w.wiki/?search=X actually loads the search results from Meta-Wiki because the search query parameter overrides MediaWiki's normal request handling, and the UrlShortener redirect. I don't think anything bad can happen from loading MediaWiki on a non-MediaWiki domain, but I'd rather just not.

@BBlack, can we just strip all the query parameters in varnish? Basically just limit valid requests to / and /shortcode, and ignore everything else?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Yeah. As a first pass, we could simply strip URLs with s/\?.*$//. If shortcode is a pretty restricted subset, it might make more sense just go ahead and fully regex-validate that part too.

The shortcode set is 23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz$_. It's basically static and won't be changing once we enable the shortener for everyone.

Looks like base36 in the specs, up to 9 total characters (3x wiki id, 6x article id). We could restrict legal urls to ^/[0-9A-Z]{0,9}$. Is it always uppercase? Does the extension accept/normalize lowercase as well?

Simul-edit! So it's not base36?

It is [0-9A-Za-z$\-] with 0, O, I, l, and 1 removed because they're visually confusing when written down. And yes, it is case-sensitive.

Oh, and there is no max length, we'll keep generating longer ids as people keep shortening new URLs...

Ok I basically get it. But in the two different explanations, one has - and one has _ ...?

Oops sorry, I meant _. My bad.

Ok. Either way, when I count up both representations, I get 59 total characters, does that sound right?

>>> len('23456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz$_')
59

Yep!

Change 302354 had a related patch set uploaded (by BBlack):
text VCL: validate w.wiki short URLs

https://gerrit.wikimedia.org/r/302354

Change 302354 merged by BBlack:
text VCL: validate w.wiki short URLs

https://gerrit.wikimedia.org/r/302354

BBlack claimed this task.

@BBlack this doesn't seem to be working yet? https://w.wiki/?search=X still shows search results...and https://w.wiki/IiO0o which has the non-whitelisted characters is showing the MW-generated error,

Yeah I checked as well, and it doesn't work. Most likely the simple patch is subtly broken...

Change 302752 had a related patch set uploaded (by BBlack):
Text VCL: validate w.wiki URLs (for real this time)

https://gerrit.wikimedia.org/r/302752

Change 302752 merged by BBlack:
Text VCL: validate w.wiki URLs (for real this time)

https://gerrit.wikimedia.org/r/302752

Thanks! It works now :)