Page MenuHomePhabricator

HTML is a mess
Closed, InvalidPublic

Description

Author: brian

Description:
I don't think this requires any explanation. There should be two ways of viewing HTML:

  • Normal: remove all unnecessary characters to minimize download time
  • Special: format the HTML neatly (e.g. "</head>" is on its own line and directly

below "<head>") for those who want to examine it

Alternatively we could just use the first option and those who want to examine the
HTML can use another program to clean it up, or do so manually.


Version: unspecified
Severity: minor

Details

Reference
bz4211

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 8:58 PM
bzimport added a project: MediaWiki-Parser.
bzimport set Reference to bz4211.
bzimport added a subscriber: Unknown Object (MLST).

Try to have bug reports relate to MediaWiki in some way, rather than general comments
about a standard markup language.

brian wrote:

My comments are relevant generally but I am thinking here about the specific HTML
outputted by MediaWiki, and this is the only place to fix it. Therefore, my report
does relate to MediaWiki.

Well, if you want to pretty up the HTML you can copy-and-paste it to any text editor with XML/HTML
prettification functions. So... seems done?

brian wrote:

I did acknowledge this in my original comment. You did not address the other
statement I made: we should compact the HTML to make it smaller and load faster.

robchur wrote:

The XHTML standard doesn't require that markup is formatted to be human-readable.

avarab wrote:

$ perl -MLWP::Simple -le '$c = get
"http://localhost/mw/HEAD/wiki/Albert_Einstein"; (@c) = $c =~ /\n/g;print for
length $c, scalar @c, ((scalar @c)/(length $c))*100 . "%"'
124961
1717
1.37402869695345%

Stripping newlines would result in approximately 1.5% bandwidth saving in the
XHTML for each page acc. to my tests.

Did you test with or without gzip encoding?

avarab wrote:

(In reply to comment #7)

Did you test with or without gzip encoding?

Assuming that it's ungzipped...

brian wrote:

What's gzip encoding got to do with this? What do we think about the 1.5% figure?

robchur wrote:

(In reply to comment #9)

What's gzip encoding got to do with this? What do we think about the 1.5% figure?

It has a lot to do with determining whether or not the performance gain is worth
it, considering our various caching and compression systems. Obviously, we like
the figure if it's worth it...

avarab wrote:

Test script

Seems to be around 0.5% space saving if gzip is accounted for

Attached:

brian wrote:

As I understand it, gzip is not used over the Internet in this case but other
compression may be used. Is this the case? Do all the Wikimedia systems use gzip
only?