Page MenuHomePhabricator

archivebot: avoid exceeding max page size (contenttoobig)
Closed, ResolvedPublicBUG REPORT


There is an upper limit to the size of a page. Archive pages can be too large for it. The value is available in the siteinfo API: 'maxarticlesize'.$wgMaxArticleSize

When an archive page reaches maxarticlesize, the bot should stop trying to add content to it and find somewhere else to archive to.

For size-based archiving, it will be straightforward. You can just cap the max size parameter with maxarticlesize.

For time-based archiving, it does not seem so simple. One possibility might be to create a new archive page by adding a suffix (e.g. 2020 → 2020_(2)), which will make implementation a bit complicated, and the index page of archives will be messed up. There might be a better way to handle this.

Any thoughts?

Original discussion:

Event Timeline

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald Transcript
Xqt triaged this task as Medium priority.Mar 9 2021, 4:13 PM
Xqt changed the subtype of this task from "Task" to "Bug Report".

Change 670260 had a related patch set uploaded (by Xqt; owner: Xqt):
[pywikibot/core@master] [IMPR] limit 'maxarchivesize' parameter to 'maxarticlesize'

Probably solved already with rPWBCf87ad95 for time based archiving.

Change 670260 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Always take 'maxarticlesize' into account when saving

Xqt claimed this task.