Page MenuHomePhabricator

Oversized HTML documents, and server crash
Closed, DeclinedPublic

Description

Occasion: https://de.wikipedia.org/wiki/Special:PermaLink/143840984

Server answer

503 Service Temporarily Unavailable

No cache server id, no system version, no details.

Observed since 2016-05-29.

Statistics

  • Source text size: 1,175,164 bytes
  • Number of internal links: 45,445
  • Length of all link titles: 993,384 bytes
  • Estimated size of generated HTML document: 4.5 MB
    • An internal link [[2Long]] leads to
      • <a href="/wiki/2Long" title="2Long">2Long</a>
    • 4 bytes for brackets are expanded to 30, adding 1,181,570 bytes for HTML markup.
    • Link title is occurring 3 times, adding further 1,986,768 bytes.
  • There is a limit for template expansion, but not for plain wikitext expansion.
    • E.g. tables and images result in larger HTML code.
  • Diff page still works (7.8 MB)
  • &action=raw available
  • Obviously parsing broke some server bounds without being caught.
  • This user page is entirely useless, the user is no longer active.

Avoid delivery of oversized HTML responses

Users might be connected by a slow or expensive network. Remaining data volume for the last week of this month may be less than 100 MB.

Users should not be surprised by a very large page. Wikitext after template expansion is limited to some 2 MB, and generated HTML of content area should be limited to a reasonable size based upon wikitext limit, e.g. 3 MB. Some 200 kB will be added for framework anyway, used for portal navigation and resource updating communication. Thumbnail images will be transferred, too.

When rendered content limit has been exceeded,

  • the entire content should be discarded.
  • A brief red system message should be displayed instead.
    • Keep It Shorter, Stupid.
  • Page should throw a maintenance category, as already when hitting template expansion size.

This goes for parsed content. Other pages, like diff page or &action=raw or requests for many many external links, multimedia etc. are not subject to a parsing limitation.

Asking for a regular Wikipedia article must not respond with large HTML document.

It might be possible to show 40,000 images on a page without exceeding 2 MB wikitext size. There should be a limit of some 1000 different embedded <img> requested for download.

I recollect that similar issues have been discussed some five or ten years ago, and a $wg has been introduced already or a deliberate suggestion for $wg has been made, but I cannot find the old threads.

Event Timeline

Cannot reproduce 503 error anymore, link nowadays shows:

The revision #143840984 of the page named "Wikipedia:Hauptseite" does not exist.
This is usually caused by following an outdated history link to a page that has been deleted. Details can be found in the deletion log.

I don't think generally restricting page size for everybody to 3MB outweighs annoyance for others, without considering connection bandwidth, mobile vs non-mobile, and many other factors.