Page MenuHomePhabricator

some pages are delivering raw GZIP encoding
Closed, DuplicatePublic

Assigned To
None
Authored By
bzimport
Aug 13 2008, 8:38 AM
Referenced Files
F5127: wp-msie-documenttype_dump_jha2.zip
Nov 21 2014, 10:19 PM
F5126: x.gz
Nov 21 2014, 10:19 PM
F5125: Jena_Six
Nov 21 2014, 10:19 PM
F5124: wp.gzip1.gif
Nov 21 2014, 10:19 PM
F5122: The_Inheritors_(William_Golding)
Nov 21 2014, 10:19 PM
F5123: The_Inheritors_(William_Golding)-decoded.htm
Nov 21 2014, 10:19 PM

Description

Author: rividh

Description:
http://en.wikipedia.org/wiki/The_Inheritors_(William_Golding) is one of the affected pages, but I've seen 3 or 4 others today that do the same thing: in an older browser, which normally renders Wikipedia as very legible plain text, I am getting naked GZIP (compressed to binary). The affected pages work fine in a newer browser. It is reproduceable for the affected pages, however these may be random pages (tomorrow it may affect some other pages!)

When I save the file (exactly as sent to me by the server) locally, and look at it with a hex viewer, I confirmed that it is a compressed binary, not text. QuickViewPlus IDs the file as UNIX GZip. However QVP decompresses not to text (HTML) but rather, to this interesting goulash (sample from first line):

/*::lh1 clasl1-first="Mjnt="pagn lhrefead> faviaml:: WerSub">Frrplal:RecconChontes i_ lfeed=ds"mft/fh3yla hxml: clasl1yla hxml: jump&am-

Might this indicate that the GZip is corrupt, thus not being decoded by the unforgiving older browser? (If so, might this indicate a corrupt cache file or a hard disk going bad?)

I saved and looked at some working pages and confirmed that NORMALLY, my old browser sees perfectly normal HTML.

I've encountered this "what do you mean, some pages arrive in GZip?" issue on another site where I was able to research the problem, and that proved due to a server bug, tho I don't recall the details as it was some years ago.

Affected browser: Netscape 3 (still wonderful for READING TEXT!)
Not affected (same system): Seamonkey 1.1.9

This is not a matter of NS3 not knowing what to do with GZip; most servers now use compression, and finding myself with naked GZip is VERY rare (this is maybe the 3rd or 4th time I've seen it in 12 years online).


Version: unspecified
Severity: major
URL: http://en.wikipedia.org/wiki/The_Inheritors_(William_Golding)

Details

Reference
bz15149

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:19 PM
bzimport set Reference to bz15149.
bzimport added a subscriber: Unknown Object (MLST).

Can you please attach the file you saved? (Click "add an attachment")

rividh wrote:

As sent to Netscape 3

Attached:

rividh wrote:

How it decodes (done by WinRAR)

This is the same file (the one NS3 spit up as raw GZip) as decoded by WinRAR (which also whines that "the file is corrupt"). You can see that the decompressed HTML is a poor match for the page's actual content! Only the first disk-sector worth or so is not mangled.

Attached:

tk1b wrote:

probably the same bug:
server returning "Content-encoding: gzip" and gzipped content on some pages with some clients

clients

  • IE 6.0
  • Oberon V4-2.3 Web 1.0 (Andreas Krumenacker) 1997
  • Sam Spade (Beta) 1.14 (Steve Atkins) 1997-1999

no issue with Firefox 3.0
no proxy - ISP t-dialin.net (german telekom)
happens also when logged in (checked only with Oberon)
happens every time loading the page
(answering https://bugzilla.wikimedia.org/show_bug.cgi?id=7098#c13 )

examples:
http://de.wikipedia.org/wiki/Bhopal
http://en.wikipedia.org/wiki/Bhopal_(disambiguation)
http://en.wikipedia.org/wiki/Jena_Six
not with:
http://en.wikipedia.org/wiki/Jena_six

tk1b wrote:

headers in sam spade

Attached:

wp.gzip1.gif (846×1 px, 60 KB)

This problem is still there, from OTRS: [[Yiff]], IE6 on XP.

rividh wrote:

All the above links work okay today, but yesterday I ran into another page that repeatedly came across as GZip for NS3. (Forgot to record which page, but I think it's a random server error, hence the individual pages affected vary from day to day)

Is this relevant? http://www.linuxplanet.com/linuxplanet/tutorials/5461/1/
http://www.schroepl.net/projekte/mod_gzip/browser.htm.htm

From the latter page:

Netscape Navigator 3

This browser uses HTTP/1.0. It doesn't send an Accept-Encoding header, thus doesn't request compressed content
from a server.

The browser does not yet support the processing of compressed page content. If it receives gzip compressed content, it recognizes that there is an encoding gzip unknown to it (and displays a corresponding message to the user), but after that it displays the compressed page content within the browser windows. Serving compressed content unconditionally (like in statically precompressed documents) this browser isn't good for.

A web server correctly evaluating the Accept-Encoding header is able to serve usable, uncompressed data to the browser.

So it sounds like sometimes the server is not hearing the "Accept encoding" header from the browser, and is defaulting to GZip.

(Hey guys, thanks for taking this seriously, and helping keep Wikipedia accessable to everyone everywhere)

The MW core code looks fine, perhaps it is a header handling issue with the squids?

Most likely a Vary-related Squid bug.

I just got a wikipedia page with Firefox 3 claiming that it used a compression it couldn't understand. Just reloading went fine. Could be a network glitch or a symptom of something more serious.

More complaints via OTRS, now on IE5/Mac (but works on Netscape 7).

*** Bug 15830 has been marked as a duplicate of this bug. ***

Created attachment 5386
https://bugzilla.wikimedia.org/attachment.cgi?id=5180 with \r\n line endings converted to \n

It seems that the corruption in attachment #5180 is due to something trying to convert unix-style "\n" line endings to windows-style "\r\n" line endings. If this is undone, it decompresses without errors.

I can't say whether this was done by MediaWiki, Squid, a proxy ("transparent" or otherwise) on Rez's end of the connection, or just mis-saved out of NS3.

Attached:

rividh wrote:

In my experience (across thousands of compressed files including GZips) Netscape does NOT corrupt saved files; it just saves whatever the server sends it.

QuickViewPlus' decompressor might have mangled it, tho I've never seen that happen before -- but remember, WinRAR thought the original file was corrupt, and it's usually right.

No proxies here that I'm aware of. I'm on a fixed-wireless connection via a Motorola radio-modem (it does have a built-in router), and my local provider goes direct to AT&T's backbone.

There could be a difference between Netscape downloading a compressed file and Netscape saving compressed file data it is displaying as if it were text.

rividh wrote:

Possible; I don't recall which way I saved the sample. The browser view has a line length limit for what it will display; don't know if that affects saving it. However, I do know NS does NOT corrupt stuff if you do "Save Target As..." without displaying it first (I use it to save misc. binaries all the time).

Next time I see one of these pages I'll try it both ways and find out :)

BTW I got another page in naked GZip a couple days ago, something entirely random (was in a hurry and forgot to note the URL :(

  • Bug 15993 has been marked as a duplicate of this bug. ***

addicks wrote:

TCPDUMP of IE7 behind Squid, anon user, cleared browsercache before

downloaded "files" included to .pcap (wireshark dumpfile)
Happens very randomly, today mostly on project pages (but main article namespace as well affected, but not in dump)

Attached:

This time the Wikipedia:F and WPVS appear there. Use filter (ip.addr eq 10.254.130.191 and ip.addr eq 10.254.130.190) and (tcp.port eq 2796 and tcp.port eq 126) and scroll to the bottom.
Still, I don't see anything wrong in the communication. Browser states Accept-Encoding: gzip and response is gzippped and contains Content-Encoding: gzip, with text/html Content-Type.

Is that really IE7 or is it Firefox with IE7 User-Agent?
I had never seen a 'X-moz: prefetch' header from IE. Some people report that Google Web Accelerator also send it. Are you running it?
Maybe the Google toolbar is adding 'accelerating' features?

addicks wrote:

It is really IE7.
on the machine there google-toolbar (for IE) and google desktop installed.

How do i find out if "google web accelerator" is installed? How do i turn it off?

And even if so: Since mediawiki (wikipedia, openstreetmap and ubuntu-wiki) are the only websites ones i encounter the problem, it is something which may be a failure of Microsoft and/or Google, but mediawiki (with our without aggressivly tuned source-squids) seems to trigger this effect.

While i'd like to blame Microsoft or Google, there's something wrong on wikimedia side, as there have been reports on several browsers and different variants (there may be several bugs looking the same).
However, maybe they make it more likely.

If you don't know about google web accelerator, you probably don't have it. Google web accelerator is like a proxy, so would have shown on the captures
You can try disabling (or uninstalling) Google Toolbar and check if it goes away. Also by doing a direct connection, instead of using the squid. It's happening to you in an high rate, which is good, because it allows the developer to obtain more data, check if the proposed solution works...

rividh wrote:

I have absolutely NO toolbars or add-ons installed in my old Netscape, where the problem was first observed. Just buck-naked Netscape. No popup blockers or similar utils installed on the system, either.

I'm wondering if it's a variant of the "Document contains no data" bug, which I had cause to research a few years back, and learned that it's actually a server bug which is triggered by a deficiency in the browser -- essentially, it's failure to notice that the browser can't accept compression. Sound familiar? :) (Novell issued a patch to address this problem in one of the early internet-enabled versions of Netware.)

  • This bug has been marked as a duplicate of bug 7098 ***