Page MenuHomePhabricator

Start a project to reduce page weight
Closed, DeclinedPublic


Start a project -- a real project with measurable goals and a schedule -- to reduce page weight.

From by @Guy_Macon and others.

That big section in that big page has several good ideas and arguments that need a better structure and context for work. Let's capture what matters in this description and let's find or create subtasks, blockers, related tasks.

Event Timeline

Qgil raised the priority of this task from to Needs Triage.
Qgil updated the task description. (Show Details)
Qgil added subscribers: Qgil, Guy_Macon, Doc_James and 3 others.

In the HTML that Wikipedia sends to the browser, every line sent to the browser has a DOS-style carriage return and line feed (OD OA) as a line ending.

If we used a Unix-style line feed (0A) that would save *one byte on every single line of HTML on Wikipedia.*

Actually HTML works just fine with both the carriage return *and* the line feed removed, but let's just discuss (OD OA) vs (OA) for now.

We could do this by making the line ending configurable in the preferences with the default (OD OA) and (OA) an option, then after we are sure there are no bad effects, change the default.

Why am I unable to get any developer at the WMF to discuss the merits of doing this? --Guy Macon

@Guy_Macon, all our HTML is served with gzip compression, which makes sure that newlines are encoded very compactly.

I am working on setting up a procedure (using Slackware or windows 10) that will allow me to efficiently measure the result after gzip compression. See [ ]. More later when I have some actual numbers. --Guy Macon

Sorry for not responding. So far I have not been able to get anyone at the WMF who has the ability to make this change to discuss the merits of doing this. A developer working on page weight could do a quick test in less than five minutes that will answer the gzip question using the actual Wikipedia environment. Instead, I am being asked by people who have zero ability to actually make the change to (imperfectly) duplicate Wikipedia and do my own tests, and of course if I do that I will be then be told that I have not duplicated the Wikipedia environment correctly (and they would be correct). Plus, due to the overhead and latency of compression and decompression, websites typically only gzip files above a certain size threshold, so I would also have to figure out the minimum size at which Wikipedia stops compressing, estimate how many pages are below that (redirects are tiny), and in the end I still will have utterly failed to open up a discussion with an actual developer who has at least the potential of running a real-world test of my proposal.

I give up. I see a lot of work with zero chance of any change being made. Remember, this started as a small, noncontroversial test request because I was unable to get anyone to actually evaluate my original suggestion (start a project -- a real project with measurable goals and a schedule -- to reduce page weight). I have my answer. I was also unable to get anyone to actually evaluate my (OD OA) vs (OA) suggestion, run a couple of tests, and give me a yes or no answer.

For the technically inclined. if my understanding is correct, making a page smaller (through OD OA vs OA or gzip) is pointless if the page can already fit inside a single TCP packet, so I would also have to figure out what the Maximum Transmission Unit (MTU) is for Wikipedia and estimate how many pages are smaller than that. This also makes reducing the size of a page that is slightly bigger than a packet so that it now fits a big win -- 50% reduction in page weight -- but only for those specific pages. This really does need a developer to do some research using the actual Wikipedia environment.

@GWicke, when you wrote "all our HTML is served with gzip compression", did you mean that everything uses gzip compression, or like Guy writes above, is it only pages above a certain size?

Re: "when you wrote "all our HTML is served with gzip compression", did you mean that everything uses gzip compression, or like Guy writes above, is it only pages above a certain size?":

According to [ ], If Wikipedia is gziping everything, then we are expending server resources to make very small pages larger.

Related: At [ ] you will find an experiment where (on the one page tested) reducing the size of uncompressed HTML in ways that do not change what the user experiences reduced the uncompressed file by 17% After compression, the space saving was 10% (20KB vs. 22KB). This disproves the theory that reducing the the size of uncompressed HTML makes no difference in the size of the compressed file.

Again, a developer should test (OD OA) vs (OA) on several files of different sizes (including a short redirect like [ Whacamole ], a long page like [ List of named minor planets (numerical) ], the main page, and a high-traffic page picked from [ User:West.andrew.g/Popular pages ]. Then we would have some real data to make an informed decision with. I am not holding my breath...

Krinkle added a subscriber: Krinkle.

As Gabriel mentioned, the encoding used for new lines is unimportant as Gzip will compress these into one lookup regardless of which style is used.

For all our source code we use the 1-byte LF-style line breaks, but HTTP-based web services often standardise regardless on the larger legacy-style CRLF. This is outside our control and either way, of no notable influence on the output size.

Regarding compression on small articles, in theory a very small file would be larger, but remember that even small articles still have the Skin and User toolbox template around them, which makes it virtually always above such threshold.

We enable GZIP for our HTML, CSS and JS responses at the web server layer. That is obviously better than turning it off. Arguably, even if such tiny responses were possible to generate in MediaWiki, it would be so small that millisecond spend compressing/decompressing that is negligible and not worth the added complexity of trying to dynamically disable/enable the mode, which would ironically slow down everyone else.

If this were possible, it would likely need to be done at the web server layer itself (Apache, Nginx, Varnish) - which are upstream software packages we use.

I'll decline this task for now because it has no resources and is on the "Radar" of Perf-Team but not on any other teams' workboard, which means it cannot be worked on.

You may want to pursue this upstream with the aforementioned web server vendors – at which point it could benefit many others websites, including us.