Page MenuHomePhabricator

archivebot: avoid exceeding max page size (contenttoobig)
Open, MediumPublicBUG REPORT

Description

There is an upper limit to the size of a page. Archive pages can be too large for it. The value is available in the siteinfo API: 'maxarticlesize'.

https://www.mediawiki.org/wiki/Manual:$wgMaxArticleSize
https://www.mediawiki.org/wiki/Special:ApiSandbox#action=query&format=json&meta=siteinfo&siprop=general%7Cnamespaces%7Cnamespacealiases%7Cstatistics

When an archive page reaches maxarticlesize, the bot should stop trying to add content to it and find somewhere else to archive to.

For size-based archiving, it will be straightforward. You can just cap the max size parameter with maxarticlesize.

For time-based archiving, it does not seem so simple. One possibility might be to create a new archive page by adding a suffix (e.g. 2020 → 2020_(2)), which will make implementation a bit complicated, and the index page of archives will be messed up. There might be a better way to handle this.

Any thoughts?

Original discussion: https://commons.wikimedia.org/w/index.php?title=User_talk:ArchiverBot&oldid=436612913#Not_archiving_a_page

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Xqt triaged this task as Medium priority.Mar 9 2021, 4:13 PM
Xqt changed the subtype of this task from "Task" to "Bug Report".

Change 670260 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] limit 'maxarchivesize' parameter to 'maxarticlesize'

https://gerrit.wikimedia.org/r/670260

Probably solved already with rPWBCf87ad95 for time based archiving.

Change 670260 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Always take 'maxarticlesize' into account when saving

https://gerrit.wikimedia.org/r/670260