Page MenuHomePhabricator

squid cache of [[foo_(bar%29]] not purged
Closed, ResolvedPublic

Description

Author: dan-wikimedia

Description:
The page on Wikipedia at http://en.wikipedia.org/wiki/Donald_Brown_(anthropologist) is broken. The main content is being displayed without the head or sidebar. It's like this in all browsers. I've attached a screenshot.


Version: unspecified
Severity: normal
URL: http://en.wikipedia.org/wiki/Donald_Brown_(anthropologist)

Details

Reference
bz27935

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 21 2014, 11:25 PM
bzimport set Reference to bz27935.
bzimport created this task.Mar 8 2011, 3:08 PM

dan-wikimedia wrote:

Screenshot of Donald_Brown_(anthropologist) wikipedia page

Attached:

demon added a comment.Mar 8 2011, 3:13 PM

Can you try clearing your browser cache and see if this is still an issue?

This WORKSFORME and one other person on IRC.

dan-wikimedia wrote:

Looks like it's fixed. In Firefox and Safari on Mac OS X just clearing the cache allowed it to display the page correctly. Oddly, just clearing the cache in Firefox on Linux didn't clear it up. I had to clear out "all private data" and it finally started displaying the page correctly.

Thanks
--Dan

It sounds like it was some kind of intermittent failure with CSS loading that got cached by your browser(s). Of course, if these intermittent failures keep happening more often, it would be nice to know what's causing them. Still, I think this bug can probably be closed as WORKSFORME, subject to reopening if it happens again.

dan-wikimedia wrote:

It happened again on a different Wikipedia page, and now I know how to recreate it at will in all browsers.

I clicked on a link on one Wikipedia page that took me to this URL: http://en.wikipedia.org/wiki/Steve_Fuller_(social_epistemologist)

Notice the one common factor with the URL I submitted this bug with: the parentheses

So check this out. This happens in Firefox, Safari, and IE:

The page displays CORRECTLY if you enter that URL with the parentheses URL-ENCODED, as in "http://en.wikipedia.org/wiki/Steve_Fuller_%28social_epistemologist%29".

BUT, put it in with the parentheses as literal characters, as in "http://en.wikipedia.org/wiki/Steve_Fuller_(social_epistemologist)", and the page does NOT display correctly.

What happens is a slew of Javascript errors.

Here are the errors displayed by IE7:

Runtime error: Line 11: 'mw' is undefined
Runtime error: Line 10: 'jQuery' is undefined
Runtime error: Line 2: Object expected
Runtime error: Line 28: 'jQuery' is undefined
Runtime error: Line 159: 'mediaWiki' is undefined
Runtime error: Line 184: Object doesn't support this property or method
Runtime error: Line 1047: Object expected
Runtime error: Line 572: Object doesn't support this property or method

And here are the errors displayed in Firefox error console:

Error: mw is not defined, Line: 11
Source File: http://bits.wikimedia.org/skins-1.5/common/mwsuggest.js?283-19
Error: jQuery is not defined, Line: 22
Source File: http://bits.wikimedia.org/w/extensions/UsabilityInitiative/js/plugins.combined.min.js?283-19
Error: $j is not defined, Line: 2
Source File: http://bits.wikimedia.org/w/extensions/UsabilityInitiative/Vector/Vector.combined.min.js?283-19
Error: jQuery is not defined, Line: 112
Source File: http://en.wikipedia.org/w/index.php?title=Special:BannerController&cache=/cn.js&283-19
Error: mediaWiki is not defined, Line: 159
Source File: http://bits.wikimedia.org/skins-1.5/common/wikibits.js?283-19
Error: jQuery is not defined, Line: 1047
Source File: http://bits.wikimedia.org/skins-1.5/common/wikibits.js?283-19

And here's what we get in the Safari error console:

Failed to load resource: the server responded with a status of 404 (Not Found), bits.wikimedia.org/skins-1.5/vector/main-ltr.css?283-19
Failed to load resource: the server responded with a status of 404 (Not Found), bits.wikimedia.org/skins-1.5/common/jquery.min.js?283-19
ReferenceError: Can't find variable: mw, bits.wikimedia.org/skins-1.5/common/mwsuggest.js?283-19:11
ReferenceError: Can't find variable: jQuery, bits.wikimedia.org/w/extensions/UsabilityInitiative/js/plugins.combined.min.js?283-19:22
ReferenceError: Can't find variable: $j, bits.wikimedia.org/w/extensions/UsabilityInitiative/Vector/Vector.combined.min.js?283-19:2
ReferenceError: Can't find variable: jQuery, /w/index.php?title=Special:BannerController&cache=/cn.js&283-19:112
ReferenceError: Can't find variable: mediaWiki, bits.wikimedia.org/skins-1.5/common/wikibits.js?283-19:159
ReferenceError: Can't find variable: jQuery, bits.wikimedia.org/skins-1.5/common/wikibits.js?283-19:1047

(Thank you Firefox and Safari developers for letting us know what FILE the line errors are occurring in! Would that the IE devs had been so considerate.)

--Dan

This is weird... I still can't replicate the error using your instructions on either Firefox 3.6.15, Konqueror 4.4.5 or Opera 10.10.

That said, the 404 errors reported by Safari could well cause these symptoms -- the only question is what's causing the errors.

(BTW, could you try and see if you can reproduce the errors with "?debug=true" appended to the URLs? I don't know if it'll help in finding the cause, but at least it should be an extra data point -- and give potentially more helpful error messages, since it turns off the JS minifier.)

I can confirm that copy/pasting *http://en.wikipedia.org/wiki/Steve_Fuller_(social_epistemologist)* into IE8 causes this problem

MaxSem added a comment.Mar 8 2011, 7:05 PM

Works for me on IE8. Do you, by coincidence, edit through a proxy?

I dont, I can confirm that it fails on IE8/windows 7 Ultimate. However it works fine in firefox. my suspicion is a URL encoding issue with the URL. http://en.wikipedia.org/wiki/Steve_Fuller_%28social_epistemologist%29 works as expected however when I don't url encode it on IE8 I get messed up page content.
<content with out urlencoding>

'mw' is undefined  mwsuggest.js?283-19, line 11 character 1
'jQuery' is undefined  plugins.combined.min.js?283-19, line 10 character 183
'jQuery' is undefined  plugins.combined.min.js?283-19, line 10 character 183
Object expected  Vector.combined.min.js?283-19, line 2 character 1
'jQuery' is undefined  cn.js&283-19, line 28 character 1
'mediaWiki' is undefined  wikibits.js?283-19, line 159 character 3
Object doesn't support this property or method  Steve_Fuller_(social_epistemologist), line 185 character 81
Object expected  wikibits.js?283-19, line 1047 character 2
Object doesn't support this property or method  Steve_Fuller_(social_epistemologist), line 573 character 58

dan-wikimedia wrote:

No, I'm not going through a proxy.

Interesting:

THIS WORKS: http://en.wikipedia.org/wiki/Steve_Fuller_%28social_epistemologist%29

This DOES NOT work: http://en.wikipedia.org/wiki/Steve_Fuller_(social_epistemologist)

But THIS WORKS: http://en.wikipedia.org/wiki/Steve_Fuller_(social_epistemologist)?debug=true

I've tested all those in Firefox, Safari, IE7 and IE8 and got the same results

dan-wikimedia wrote:

I just noticed that if I use a URL with literal parentheses that doesn't display correctly, such as "http://en.wikipedia.org/wiki/Donald_Brown_(anthropologist)", and I keep reloading the page, every so often it will display correctly! Most of the reloads don't, but sometimes they do.

I tried Firefox 3.6, Firefox 4beta and Safari 5 on Mac, articles with parantheses in them work fine for me, tried a dozen different ways of visiting the page with different urls and (un)escaped, no errors for me.

I mentioned this bug on #wikimedia-tech, and got the response that "load.php is occasionally failing to load site CSS. I've hit that a few times while browsing. I'm not sure if parentheses are related, though." I figured I'd record it here.

FWIW, that matches my own experience too: there are, or at least used to be, occasional glitches with JS/CSS loading since the 1.17 deployment. I haven't seen any very recently myself, though, and I never noticed any connection with parentheses in the URL (which of course doesn't mean there isn't one).

BTW, the fact that you (Dan) see the bug sometimes coming and going when the page is reloaded suggests that it might only happen on some of Wikimedia's proxies or webservers. This might also explain a geographical dependency, since requests from different parts of the world get routed to different server pools.

Bryan.TongMinh wrote:

If you look at the provided URLs, JS is loaded from 1.16 urls.
I encountered a few days ago as well, and hashar poked around it and made it working again. I have cc'ed him.

So here's what's going on:

Pages with parentheses and certain other special characters in their titles have more than one correct URL: one with literal parentheses, one with parentheses URL-encoded, and I'm sure mixes of those two are accepted as well. However, when a page changes, Squid only purges the urlencoded URL and doesn't purge the others. This means that anonymous users will see outdated versions of these pages when they visit it through one of the other, unpurged URLs. Besides possibly outdated content, these stale versions also contain pre-ResourceLoader direct CSS links to 1.16 URLs, and of course those 404. For logged-in users, Squid caching is bypassed, so they won't observe this behavior.

I'll talk to Mark about this issue.

We should redirect, like we do for initial lower-case letters.

dan-wikimedia wrote:

So is that something you should talk with the Squid developers about? See if they can fix it so that the URL is purged even if its special characters haven't been url-encoded?

(In reply to comment #18)

So is that something you should talk with the Squid developers about? See if
they can fix it so that the URL is purged even if its special characters
haven't been url-encoded?

No, MediaWiki should be fixed to redirect to one canonical URL, per comment 17. But even such a fix would take a while to take effect properly, because you have to either actively purge the bad cache entries or wait for them to expire (I believe the expiry is 30 days).

For now I've put the skins/vector/main-ltr.css and main-rtl.css files back on the server, so at least these pages won't be served without CSS. JS will still be broken, but at least the page will be usable.

Roan, is this still an issue? It's WFM in all examples given.

(In reply to comment #21)

Roan, is this still an issue? It's WFM in all examples given.

I'm fairly sure it's still an issue, yes. If it is, the steps below should reproduce it:

  • Log out and remove all of your wikipedia.org cookies
  • Edit a page with parentheses in the title
  • After saving your edit, visit the same page at a different URL (e.g. encode the '(' as '%28' , or decode it if is was encoded)
  • You should see an older version of the page that does not contain your edit

If you tried this while logged in, you would bypass the Squid cache and you wouldn't see the problem.

  • Bug 32150 has been marked as a duplicate of this bug. ***

EN.WP.ST47 wrote:

*** Bug 35293 has been marked as a duplicate of this bug. ***

Anomie added a comment.Nov 6 2013, 4:57 PM

(In reply to comment #17)

We should redirect, like we do for initial lower-case letters.

Can we safely do that, though? A path /wiki/foo_(bar) is not equivalent to /wiki/Foo_(bar), but /wiki/Foo_(bar) and /wiki/Foo_%28bar%29 are supposed to be considered equivalent according to RFC 2616 section 3.2.3. Some client or proxy somewhere might decide to always transform the former into the latter, and then MediaWiki serving a redirect back to the former would result in a loop. Or the client might decide that the redirect to an equivalent title is a loop and show an error to the user.[1]

[1]: e.g. http://webmasters.stackexchange.com/questions/2770

Anomie added a comment.Nov 6 2013, 9:52 PM

I suppose another question to ask is whether Varnish is any better at this than Squid. If Varnish can easily handle knowing that /wiki/Foo_(bar) and /wiki/Foo_%28bar%29 are equivalent, might this soon be irrelevant?

Unless we still want to support third parties using Squid instead of Varnish, I suppose.

(In reply to comment #25)

(In reply to comment #17)

We should redirect, like we do for initial lower-case letters.

Can we safely do that, though? A path /wiki/foo_(bar) is not equivalent to
/wiki/Foo_(bar), but /wiki/Foo_(bar) and /wiki/Foo_%28bar%29 are supposed to
be considered equivalent according to RFC 2616 section 3.2.3. Some client
or proxy somewhere might decide to always transform the former into the
latter, and then MediaWiki serving a redirect back to the former would result
in a loop. Or the client might decide that the redirect to an equivalent
title is a loop and show an error to the user.[1]

Yes, you're probably right. There is a risk of redirect loops.

(In reply to comment #26)

I suppose another question to ask is whether Varnish is any better at this
than
Squid. If Varnish can easily handle knowing that /wiki/Foo_(bar) and
/wiki/Foo_%28bar%29 are equivalent, might this soon be irrelevant?
Unless we still want to support third parties using Squid instead of
Varnish, I suppose.

Well, it would be nice to fix this bug in MediaWiki so that any frontend would be supported, but doing it in Varnish would be the next best solution if it is not possible to avoid redirect loops in MediaWiki.

Tim, is this something you can handle while you're looking at bug 31369?

Change 96941 had a related patch set uploaded by Tim Starling:
Normalise the path part of URLs in the text frontend

https://gerrit.wikimedia.org/r/96941

Change 96941 merged by Tim Starling:
Normalise the path part of URLs in the text frontend

https://gerrit.wikimedia.org/r/96941

Actually, this was probably fixed in May 2012, Ie38ae198b. Testing I did during the deployment of the above change supports this. But, the current patch was worth doing anyway, since after Ie38ae198b, all requests for non-canonical encodings were cache misses due to MW serving them with CC: private (this was confirmed in my testing today). The current patch makes them cache hits instead.

GWicke added a subscriber: GWicke.EditedOct 21 2015, 8:40 PM

I wonder if we should look into making percent encoding normalization more conservative. IIRC we ran into parenthesis normalization with RESTBase, and ended up disabling it for RESTBase requests. The same issue seems to have bitten MW core again recently, which sounds like it contributed to T104755.

Edit: I found the patch for this, and the RESTBase issue was actually about slashes being decoded.