Page MenuHomePhabricator

Pywikibot: IndexPage.num_pages fails if there are no pages yet
Closed, ResolvedPublicBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

Try to get the number of pages of an Index page without any Page pages yet (i.e. a brand new index):

import pywikibot
import pywikibot.proofreadpage

INDEX = 'Index:The_Atlantic_Monthly_Volume_46.djvu'

site = pywikibot.Site('en', 'wikisource')
index = pywikibot.proofreadpage.IndexPage(site, INDEX)

page = index.num_pages

What happens?:

Traceback (most recent call last):
  File "/tmp/./test.py", line 9, in <module>
    page = index.num_pages
  File "/usr/lib/python3.9/site-packages/pywikibot/proofreadpage.py", line 81, in wrapper
    self._get_page_mappings()
  File "/usr/lib/python3.9/site-packages/pywikibot/proofreadpage.py", line 896, in _get_page_mappings
    raise ValueError(
ValueError: Missing class="qualityN prp-pagequality-N" or class="new" in: [[en:Index:The Atlantic Monthly Volume 46.djvu]].

What should have happened instead?:

Should return the number of pages, in this case, 882.

Looks like this is the culprit:

# Try to purge or raise ValueError.
if not self._soup.find_all('a', attrs=attrs):
    self.purge()
    del self._parsed_text
    self._parsed_text = self._get_parsed_page()
    self._soup = _bs4_soup(self._parsed_text)
    if not self._soup.find_all('a', attrs=attrs):
        raise ValueError(
            'Missing class="qualityN prp-pagequality-N" or '
            'class="new" in: {}.'.format(self))

I'm not sure exactly what the intention is of this check.

Version: Git master @ ca74fe2f8

Event Timeline

Xqt triaged this task as Medium priority.Apr 16 2021, 4:05 PM
Xqt added a subscriber: Mpaa.

Change 680390 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [bugfix] proofreadpage: search for "new" class after purge

https://gerrit.wikimedia.org/r/680390

Change 680390 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] proofreadpage: search for "new" class after purge

https://gerrit.wikimedia.org/r/680390