Page MenuHomePhabricator

Increase $wgMaxArticleSize to 4MB for ruwikisource
Open, MediumPublic

Description

Please increase $wgMaxArticleSize to 4MB for ruwikisource. Because books are often larger than the 2MB limit. (Let me remind you that the UTF-8 encoding for Cyrillic reduces the size of texts by 2 times, also various included templates are used.)

Reason:
Splitting of a literary work into many site pages is not convenient and labor-intensive for editors. Laying out one work into separate pages takes up almost all the free time that user can be allocated to Wikisource during the day. This time could have been better spent.

In practice, reading split pages is inconvenient for readers in many cases. Especially acute problem with reading on e-books. Exporting these parts to files is too laborious and bugged (it is easier to consider it not working, this deserves a separate discussion). It is easier for the reader to go to another site where to download the entire page with the work or a file.

It is also impossible for the bot to import many texts into Wikisource from sources larger than 2 MB, bc wikiAPI blocks importing.

By request on the admin forum ruwikisource.

Difficulties:
I see difficulty only with the syntax highlighting gadget in the browser editor, which is auto-disabled on large pages.

Event Timeline

Reedy renamed this task from Increase $wgMaxArticleSize to 4 Mb in LocalSettings.php for ru.Wikisource to Increase $wgMaxArticleSize to 4MB for ruwikisource.May 21 2022, 8:58 AM
Reedy updated the task description. (Show Details)
Reedy updated the task description. (Show Details)

User Vladis13 did a great job importing some public domain texts in Russian Wikisource. He also prepared a folder with texts that were too big: https://disk.yandex.ru/d/QjFXyEiY0t7Qvw

I analyzed that folder and created this chart with file size buckets:

Screenshot 2022-05-21 at 21.47.22.png (680×1 px, 71 KB)

(you can see the chart and data in this spreadsheet: https://docs.google.com/spreadsheets/d/18sh9wyqzUg9MYbJpzrcq77UOR8B7MfTN5GpE1yyUFbE/edit?usp=sharing)

Conclusion: If the maximum article limit will be increased to 4 MB, he can automatically upload 314 books into Russian Wikisource. There are still 14 books that are larger than 4 mb, but at this point we can deal with those manually and split them into parts. But it's impossible to do it 300 books.

Urbanecm subscribed.

Tagging with the same tags as T275319: Change $wgMaxArticleSize limit from byte-based to character-based. This will require approval from Performance-Team at least.

I'd like to explicitly link @Krinkle's comment from T275319#7947012 here. It applies to this case as well, and it perfectly describe where the issue with raising is.

This is one of the longest pages in Russian Wikisource: https://ru.wikisource.org/wiki/%D0%A4%D0%B8%D0%BD%D0%B8%D0%B0%D1%81_%D0%A4%D0%B8%D0%BD%D0%BD,_%D0%B8%D1%80%D0%BB%D0%B0%D0%BD%D0%B4%D1%81%D0%BA%D0%B8%D0%B9_%D1%87%D0%BB%D0%B5%D0%BD_%D0%BF%D0%B0%D1%80%D0%BB%D0%B0%D0%BC%D0%B5%D0%BD%D1%82%D0%B0_(%D0%A2%D1%80%D0%BE%D0%BB%D0%BB%D0%BE%D0%BF)/%D0%94%D0%9E

WebPageTest results for this page: https://www.webpagetest.org/result/220524_AiDcHR_FX9/

NewPP report for this page:

<!--
NewPP limit report
Parsed by mw1342
Cached time: 20220518014801
Cache expiry: 1814400
Reduced expiry: false
Complications: []
CPU time usage: 0.672 seconds
Real time usage: 0.766 seconds
Preprocessor visited node count: 750/1000000
Post‐expand include size: 14361/2097152 bytes
Template argument size: 3255/2097152 bytes
Highest expansion depth: 7/100
Expensive parser function count: 2/500
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 0/5000000 bytes
Lua time usage: 0.071/10.000 seconds
Lua memory usage: 1869248/52428800 bytes
Number of Wikibase entities loaded: 0/400
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00% 175.201 1 -total
90.55% 158.652 1 Шаблон:Отексте

3.27%    5.736     36 Шаблон:Right
1.68%    2.952      1 Шаблон:Imported/lib.ru
1.61%    2.817      1 Шаблон:License

-->

<!-- Saved in parser cache with key ruwikisource:pcache:idhash:1019312-0!canonical and timestamp 20220518014800 and revision id 4479294. Serialized with JSON.
-->

In comparison, this is some random short page from Russian Wikisource: https://ru.wikisource.org/wiki/%D0%95%D0%AD%D0%91%D0%95/%D0%A1%D0%BF%D0%B8%D1%86

WebPageTest results for this page: https://www.webpagetest.org/result/220524_AiDc1H_G8Y/

NewPP report for this page:
<!--
NewPP limit report
Parsed by mw1312
Cached time: 20220518081858
Cache expiry: 1814400
Reduced expiry: false
Complications: []
CPU time usage: 0.145 seconds
Real time usage: 0.218 seconds
Preprocessor visited node count: 118/1000000
Post‐expand include size: 6058/2097152 bytes
Template argument size: 115/2097152 bytes
Highest expansion depth: 10/100
Expensive parser function count: 7/500
Unstrip recursion depth: 0/20
Unstrip post‐expand size: 0/5000000 bytes
Lua time usage: 0.090/10.000 seconds
Lua memory usage: 2500307/52428800 bytes
Number of Wikibase entities loaded: 0/400
-->
<!--
Transclusion expansion time report (%,ms,calls,template)
100.00% 194.731 1 -total
88.83% 172.988 1 Шаблон:ЕЭБЕ
11.05% 21.516 1 Шаблон:ЕЭБЕ/Перенаправление

6.58%   12.822      1 Шаблон:ЕДО
2.07%    4.034      1 Шаблон:ЕЭБЕ/Ссылка

-->

<!-- Saved in parser cache with key ruwikisource:pcache:idhash:193002-0!canonical and timestamp 20220518081858 and revision id 2611247. Serialized with JSON.
-->