Page MenuHomePhabricator

Problem with $wgMaxArticleSize at cswiki
Closed, DeclinedPublic

Description

Hi, there are several articles on the Czech Wikipedia that exceed $wgMaxArticleSize. I know of one article that has already been split on our Wikipedia due to the limit: Seznam dílů seriálu Pokémon. In my opinion, such a split is very confusing and chaotic for readers, and every such split is a change for the worse for them. After consulting with @Urbanecm, I would like to ask for an optimization or increase of the limit for the Czech Wikipedia.
I understand that the limit is set for performance reasons, but I think that certain justified cases should have the limit increased.
Specifically, I'm writing because of the well-visited article Seznam dílů seriálu Simpsonovi. I've been trying to reduce the number of templates but it is no longer possible.
Would it be possible to offer a solution best suited to the needs of readers? Thank you in advance.

See also: T275319: Change $wgMaxArticleSize limit from byte-based to character-based, T308893: Increase $wgMaxArticleSize to 4MB for ruwikisource

Event Timeline

Krinkle closed this task as Declined.EditedJan 3 2023, 4:08 PM
Krinkle subscribed.
Technical perspective

@Patriccck As a technical request, I'm declining this. In particular for Wikipedia where we as the community read and write the content ourselves in this form, it would be particularly bad both for performance to increase $wgMaxArticleSize. I'll try to summarise the impact of this in case this isn't known (See also @cscott's T275319#6884320 and T275319#7947012):

  • Reading of the article would become more expensive for your connection to download.
  • Reading of the article would be more delayed, as on every page view, your device has to download, process, visually render significantly more information and other data.
  • Reading of the article may increase chances of your browser or phone/computer to crash or freeze as it runs out of available memory to hold the entire page. Thus a proportion of people may no longer be able to access the article at all.
  • Editing of the article would become an unpleasant experience. When editing a page, your device has to interact with significantly more detailed information and perform more computations than when reading. Thus an even larger proportion of people, even some that will be able to read the article, will not be able to edit the article. For those that can, it would potentially take several minutes to load, and the experience throughout would become unresponsive and below standard.
  • Saving of edits would take significantly longer. Our objective is to ensure edits can be published such that people don't have to wait for more than 1.0 second. This isn't feasible when articles become ever larger. Using the example of cs:Seznam dílů seriálu Simpsonovi, I see that null-edits to this article already take 3-5 seconds today. These are currently outliers we can tolerate just about as long as they are rare and we have incentives place to discourage more of these (including a nearby wgMaxArticleSize).
  • When the server handles an edit, we impose a hard limit of allowing no more than 60 seconds to respond to a single web request. In addition to being important for a good user experience that the site responds quickly, it is also important for security and scalablity reasons. If the server pemitted individual reasons to occupy a server for longer, it would be prone to attack and drain resources. Even without attacks, it's mostly linear in so far that if articles reserves a server for 0.1 seconds instead of 10 seconds, we can serve 100 times more people during peak or popular events before the site becomes unavailable (e.g. natural disasters, pandemics, world news, important cultural events, etc.).
Editorial perspective

Besides technical reasons, let's also look at what would happen if we could somehow make the Internet infinitely fast and cheap.

I believe it becomes increasingly difficult to find information if it is buried in ever-larger articles. Consider that we have a rich ecosystem of cross-linked subjects, search engines, and organic links from and to pages on other websites. Imagine if Wikipedia were a single article for the entire website. You would probably not navigate as much or find as much as these pointers help you arrive closer to a particular area of interest or answer. Historically, links to particular sentences or paragraphs have not been particularly stable over time as content evolves. Plus things like "Related article" discovery, "See also", "What links here" become increasingly less useful the broader a subject page becomes.

Then there is objective of information to not only be stored and discovered, but also be understood through comprehension. It becomes increasingly difficult to orient yourself and remember where you are in the subject's context when everything is on a single page. It is ultimately the responsibility of writers to work together as "curators" to decide what is relevant and important for the summary, and what to reserve for a separate page. Humans are not machines. We tend to learn best in "layers", rather than recursively with infinite detail at every step. Summarising a subject so that you can finish the page, and then choose what to go in further is important.

I can't help you with the subject matter in Czech, but I hope there's discussion pages available or village pump where these kinds of choices can be talked about and collaborated on. It can sometimes help to look at how other language communities have organised a similar topic, or ask for help from subject experts or more experienced writers in your community, or on other sister projects (e.g. Village Pump on English Wikipedia, or Meta-Wiki).

In the specific case you mention about Pokemon (cs: Pokémon episodes), I agree that a stub page with only a link to season 1-14 and 14-current is not ideal. It gives the reader very little information about where they need/want to go unless they happen to know the season number, and even then it would likely be nicer to go to that directly. I don't know if there are Czech Wikipedia conventions that require this kind of split, but please know that there are other ways of splitting this kind of article. The other example you mention ((cs:Simpsons episodes) has not been split yet. If you look at the English Wikipedia's list of Simpons episodes you'll see a different approach. They kept the season overview table on the original article page, so the page is not simply cut in two halves. They then updated the overview table to link directly to the individual season pages or in some cases to a partial overview with several seasons. Either way, the split is mostly invisibl to the reader.

Organic subjects are sometimes difficult to decide on where to split (e.g. where do you stop "Italy § History" and where do we start "History of Italy"). However, they do naturally allow for curating and summarising. When dealing with complete lists, especially episode lists that can effectively grow forever, there has to be a page limit somewhere. Even if we were to double the size limit, it won't take long before we discover a daily TV show with more than double the episodes. There has to be a strategy for splitting lists. There are many different ways to do it. I've mentioned one strategy above. I encourage discussion within Wikipedia and on Meta-Wiki if you're interested in learning or improving other ways to organise and search for information. If these need technical solutions in the software, we can also explore them on Phabricator. However, it will be through something other than larger individual articles.

There are two related tasks (T275319 and T308893) about Wikisource. The interesting case there is that these generally represent republished content (not original content). This makes splitting a bit more challenging there, though it's looking like that the proposed method of splitting doesn't actually work, so the conversation will continue there, but again, for reasons stated above, most likely not by increasing the article size.

I am sorry for the late reply. Thank you so much for your extensive message. I hope it will be possible in the future (as technologies are faster and faster these days). :)