Page MenuHomePhabricator

archivebot resets the numbering to 1 even for non-year archives
Closed, ResolvedPublic

Description

Event Timeline

Xqt triaged this task as Medium priority.Feb 26 2019, 5:31 AM

The bot decides to reset counter if the archive page with counter := 1 doesn't exist. If /Archive1 had existed (even as redirect to /Archive), it would have been treated correctly (hopefully). Perhaps we shouldn't attempt to reset counter if the archive pattern doesn't depend on anything but counter.

Ghost archive problem is definitely a regression. I can solve it either by preventing saving archives with 0 archived sections or by not caching archives that we don't know they'll exist.

I see, this behavior is not ideal. Especially on cswiki the archive title changed in past for some frequently used talk pages (also Pywikibot wasn't always the way they were archived), so archivebot should not rely on the archive title unchangeability.

Traceback I got today:

$ python pwb.py archivebot Archivace
Fetching template transclusions...
Processing [[cs:Wikipedie:Potřebuji pomoc]]
101 thread(s) found on [[cs:Wikipedie:Potřebuji pomoc]]
Looking for: {{Šablona:Archivace}} in [[cs:Wikipedie:Potřebuji pomoc]]
Processing 101 threads
ERROR: Error occurred while processing page [[cs:Wikipedie:Potřebuji pomoc]]
ERROR: IsRedirectPage: Page [[cs:Wikipedie:Potřebuji pomoc/Archiv1]] is a redirect page.
Traceback (most recent call last):
  File "./scripts/archivebot.py", line 803, in main
    archiver.run()
  File "./scripts/archivebot.py", line 658, in run
    whys = self.analyze_page()
  File "./scripts/archivebot.py", line 645, in analyze_page
    if self.feed_archive(archive, t, max_arch_size, params):
  File "./scripts/archivebot.py", line 602, in feed_archive
    self.archives[title] = DiscussionPage(archive, self, params)
  File "./scripts/archivebot.py", line 427, in __init__
    self.load_page()
  File "./scripts/archivebot.py", line 445, in load_page
    text = self.get()
  File "/home/pavel/pywikibot/pywikibot/tools/__init__.py", line 1738, in wrapper
    return obj(*__args, **__kw)
  File "/home/pavel/pywikibot/pywikibot/page.py", line 486, in get
    self._getInternals(sysop)
  File "/home/pavel/pywikibot/pywikibot/page.py", line 524, in _getInternals
    raise self._getexception
pywikibot.exceptions.IsRedirectPage: Page [[cs:Wikipedie:Potřebuji pomoc/Archiv1]] is a redirect page.

It tries to edit Archiv1 page even if the current one (set in Archivace template) should be Archiv18

It tries to edit Archiv1 page...

It does not. Just tries to load it.

I see. Still it should not fail to archive the page.

Since the bot analyzes the discussion page from the top to the bottom and doesn't strictly follow the chronological order (considering the latest contribution to individual threads), there is another (hypothetical) flaw that can surprise users:

  • There are threads with timestamps from 1st January and 31st December, respectively. When we archive the former first and then change the counter, the latter cannot be archived to the correct archive.

Fixing this task, the above problem or e.g. T182685 will probably need rewriting of the main routine of archivebot.py (work in progress).

Let's concentrate on the parent task.