Page MenuHomePhabricator

"/dmirror/http/en.wikipedia.org/w/" added to article text
Closed, ResolvedPublic

Description

The article text

"| location"

has been changed to

"| location = /dmirror/http/en.wikipedia.org/w/"

for several anonymous edits.

Here are example diffs:

http://en.wikipedia.org/w/index.php?title=Laver_%28seaweed%29&diff=prev&oldid=168033887
http://en.wikipedia.org/w/index.php?title=Food_safety&diff=prev&oldid=175864761
http://en.wikipedia.org/w/index.php?title=Insect_repellent&diff=176082104&oldid=173443750 (multiple edits)


Version: unspecified
Severity: enhancement

Details

Reference
bz12217

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 9:55 PM
bzimport set Reference to bz12217.
bzimport added a subscriber: Unknown Object (MLST).

This appears to be a poorly coded web proxy, not a problem with MediaWiki. Please report it at [[Wikipedia:WikiProject_on_open_proxies]].

It _is_ a MediaWiki issue insofar as we _could_ be blocking such edits, but aren't. We used to have a problem with "backslashing proxies" that incorrectly passed any submitted data through the PHP addslashes() function, leading to "Joe's" first becoming "Joe\'s", then, if edited again, "Joe\\\'s", etc. However, I fixed that one with a simple hack: the edit token, which the browser must return correctly for the edit to be accepted, now contains a backslash.

In principle, we could eliminate (almost) all such errors by including the article text twice in the edit form, once normally and once in a hidden field, and checking that the version in the hidden field gets returned intact. Of course, this would double the amount of data that would need to be transmitted in each direction, which could be an issue for large pages. (It might also break some existing bots if the new hidden field was mandatory, but we could avoid that by leaving it optional.) Also, one case which it wouldn't fix would be when the mangled text was entirely original, such as when starting a new page or section (with "section=new"). I can think of ways to work around that latter issue using JavaScript (URL-encode or base64-encode the text and submit it for comparison), but that's yet another layer of complexity...

One benefit from such a general fix would be that we'd also catch broken user agents that mangle Unicode characters. Come to think of it, I might consider adding one or two Unicode characters to the edit token just for this. However, adding "location=test" to the edit token feels somehow excessive...

There should be other methods to catch these by now.