Page MenuHomePhabricator

Reject MediaWiki requests which use an URL with an invalid title but use some alternative mechanism to specify a valid page
Open, Needs TriagePublic

Description

An example (from T144100) is https://meta.wikimedia.org/wiki/<script>?curid=79155; oldid and diff could probably be used to the same effect. Rejecting such requests with a HTTP 400 might be marginally safer as URLs end up in various logs and reports and some of those might incorrectly assume that the path part of an URL served with HTTO 200 is safe. Similarly, some user scripts might incorrectly assume that the URL of a page which has content is safe (although they really should get the title via mw.config). It would also improve the precision of WMF pageview stats which use the usually-but-not-always correct assumption that the name of a page served with HTTP 200 is the same as the name in the URL.

Event Timeline

Other cases to consider:

  • The empty string is an invalid title, but is sometimes used for shorter URLs when diff parameters are used. This should continue to be allowed.
  • What about something like https://meta.wikimedia.org/wiki/Valid_but_incorrect_title?curid=79155? That's also going to break things that expect the title in the URL matches the page displayed, but rejecting those would break permanent links (taken from the "permalink" toolbar item) whenever a page is moved.
  • The empty string is an invalid title, but is sometimes used for shorter URLs when diff parameters are used. This should continue to be allowed.

Indeed; core explicitly supports that pattern now via LocalFile::getDescriptionShortUrl().

Is there a use case for that? It seems pointless to use both title and curid.

Is there a use case for that? It seems pointless to use both title and curid.

Go click the link I pointed out, that's what you get right now. Even if you change it, how many such links already exist?

Or click any diff link in the interface, you'll almost always get something like https://meta.wikimedia.org/w/index.php?title=Main_Page&diff=15514880&oldid=13832341 that has an ignored title parameter.

Those are links with oldid, not curid, though. With curid, I don't see why anyone would use a title (they both specify the same thing). With oldid, we certainly should be much more cautious and only reject clearly invalid titles. (At which point I guess there no reason not to do the same thing for curid and keep the code simpler. So yeah, fair point.)

With invalid title

When the title parameter is present but invalid or doesn't match the revision IDs, how about we serve a redirect to the same URL with the title corrected? (also suggested by @Anomie and @Tgr in T144100)
For example,

https://meta.wikimedia.org/wiki/Valid_but_incorrect_title?curid=79155
->
https://meta.wikimedia.org/wiki/Main_Page?curid=79155

and

https://meta.wikimedia.org/w/index.php?title=Foo&diff=15514880&oldid=13832341&diffmode=source
->
https://meta.wikimedia.org/w/index.php?title=Main_Page&diff=15514880&oldid=13832341&diffmode=source

?

That avoids breaking stale URLs (including page moves!), and should make it easy to collect good pageview metrics. Counting the redirects would be its own report, of course.

No title

For the missing title parameter case, I don't think it's as clear. I feel uncomfortable redirecting because the user might e.g. be loading a short URL to test it before pasting elsewhere, and I wouldn't want to break their expectations for the location bar in that use case. Is that already handled somehow? If not, should we consider adding internal metrics which increment the page by title once we look that up?

Change 498702 had a related patch set uploaded (by Awight; owner: Adam Wight):
[analytics/refinery/source@master] Reject invalid Page titles

https://gerrit.wikimedia.org/r/498702

Change 498702 merged by jenkins-bot:
[analytics/refinery/source@master] Reject invalid Page titles

https://gerrit.wikimedia.org/r/498702

Change 498702 merged by Fdans:
[analytics/refinery/source@master] Reject invalid Page titles

https://gerrit.wikimedia.org/r/498702