Page MenuHomePhabricator

Errors related to RecentChanges and NewPages in pagegenerator_tests
Closed, ResolvedPublic

Description

Following are the pagegenerator test failures:

For ru doomwikia:
Error in pagegenerators_tests.PageGeneratorIntersectTestCase.test_intersect_newpages_and_recentchanges

Traceback (most recent call last):
  File "tests\pagegenerators_tests.py", line 466, in test_intersect_newpages_and
_recentchanges
    pagegenerators.RecentChangesPageGenerator(site=site, total=200)])
  File "tests\thread_tests.py", line 59, in assertEqualItertools
    result = list(intersect_generators(datasets))
  File "pywikibot\tools.py", line 326, in intersect_generators
    threaded_gen = ThreadedGenerator(name=repr(source), target=source)
  File "pywikibot\tools.py", line 165, in __init__
    raise RuntimeError("No generator for ThreadedGenerator to run.")
RuntimeError: No generator for ThreadedGenerator to run.

and pagegenerators_tests.PageGeneratorIntersectTestCase.test_intersect_newpages_twice

Traceback (most recent call last):
  File "tests\pagegenerators_tests.py", line 460, in test_intersect_newpages_twice
    pagegenerators.NewpagesPageGenerator(site=site, total=10)])
  File "tests\thread_tests.py", line 59, in assertEqualItertools
    result = list(intersect_generators(datasets))
  File "pywikibot\tools.py", line 326, in intersect_generators
    threaded_gen = ThreadedGenerator(name=repr(source), target=source)
  File "pywikibot\tools.py", line 165, in __init__
    raise RuntimeError("No generator for ThreadedGenerator to run.")
RuntimeError: No generator for ThreadedGenerator to run.

However, the same tests pass for en:doomwikia and it:doomwikia.
As suggested by jayvdb, this is probably caused by zero (or very little) data in special:newpages and special:recentchanges.

Upon testing the following links:
http://ru.doomrus.wikia.com/wiki/Special:RecentChanges
http://ru.doomrus.wikia.com/wiki/Special:NewPages
and
http://doom.wikia.com/wiki/Special:RecentChanges
http://doom.wikia.com/wiki/Special:NewPages

It can be confirmed that russian doomwikia doesn't contain any entries, while english doomwikia does.

It is also worth mentioning that for italian doomwikia:
http://it.doom.wikia.com/wiki/Special:RecentChanges
http://it.doom.wikia.com/wiki/Special:NewPages
RecentPages doesn't contain any data while NewPages does. Both the tests however, pass for italian doomwiki.

Event Timeline

Omegat claimed this task.
Omegat raised the priority of this task from to Medium.
Omegat updated the task description. (Show Details)
Omegat added subscribers: Omegat, XZise, jayvdb.

I conducted the test on doomwikia again and the tests are now passing for ru doomwikia. It can be confirmed from the above URLs that there is data in Special:NewPages. Although RecentPages still doesn't seem to contain any data (meaning that test_intersect_newpages_and_recentchanges should have failed.)

Another example of a site failing this test is heroeswiki. From:
http://heroeswiki.com/Special:NewPages and
http://heroeswiki.com/Special:RecentChanges

NewPages does not have any entries while RecentPages does and hence both the tests fail. (which is what we expect)

We can notice (from italian doomwikia also) that if data is contained in NewPages and not in RecentChanges, test_intersect_newpages_and_recentchanges also passes.
Upon doing this: list(pagegenerators.RecentChangesPagesGenerator(site=s, total=200)) for 'ru' and 'it' doomwiki, it says that there are entries in Recent Changes even though none are displayed being displayed on the website. This is the reason for the tests passing. So what are these entries?

Upon doing this: list(pagegenerators.RecentChangesPagesGenerator(site=s, total=200)) for 'ru' and 'it' doomwiki, it says that there are entries in Recent Changes even though none are displayed being displayed on the website. This is the reason for the tests passing. So what are these entries?

Hmm. There are no entries on

http://it.doom.wikia.com/wiki/Special:RecentChanges

but we can see one entry if we ask for 30 days of RC instead of 7 days.

http://it.doom.wikia.com/wiki/Speciale:UltimeModifiche?days=30&limit=500

There is a bug in intersect_generators . It shouldnt cause a Runtime exception in ThreadedGenerator

gerritbot subscribed.

Change 186596 had a related patch set uploaded (by XZise):
[FIX] Pagegen: Exit intersect if a gen is empty

https://gerrit.wikimedia.org/r/186596

Patch-For-Review

Change 186610 had a related patch set uploaded (by XZise):
[FIX] ThreadedGenerator: Allow empty target

https://gerrit.wikimedia.org/r/186610

Patch-For-Review

In test_intersect_newpages_and_recentchanges(), the following is called.

self.assertEqualItertools(
            [pagegenerators.NewpagesPageGenerator(site=site, total=10),
             pagegenerators.NewpagesPageGenerator(site=site, total=10)])

assertEqualItertools expand generators to list:

datasets = [list(gen) for gen in gens]

If a list is empty, in ThreadedGenerator, the following is skipped:

if target:
    self.generator = target

and then the runtime error is thrown:

if not hasattr(self, "generator"):
            raise RuntimeError("No generator for ThreadedGenerator to run.")

Ah forgot that it's actually testing on lists. So using intersect_generators even when one of them is empty won't cause the RuntimeError usually because it's not applied to lists.

What do you mean is not applied to lists?

from pywikibot.tools import intersect_generators
datasets = [range(3), range(10)]
result = list(intersect_generators(datasets))
print result
>>> [0, 1, 2]

from pywikibot.tools import intersect_generators
datasets = [[], range(10)]
result = list(intersect_generators(datasets))
print result

Traceback (most recent call last):
  File "int.py", line 5, in <module>
    result = list(intersect_generators(datasets))
  File "/home/user/python/core/pywikibot/tools.py", line 376, in intersect_generators
    threaded_gen = ThreadedGenerator(name=repr(source), target=source)
  File "/home/user/python/core/pywikibot/tools.py", line 215, in __init__
    raise RuntimeError("No generator for ThreadedGenerator to run.")
RuntimeError: No generator for ThreadedGenerator to run.

It does work with lists (or any iterable for that matter) but when used as a pagegenerator (which is afaik the only use currently in pywikibot) it's using generators and not lists.

Tests are implemented using lists, the function supports lists. An empty list causes the RuntimeError.
How else it is used in pywikibot is irrelevant with respect of this bug.

Yeah. The tests, but only the tests. Outside of that test it's not using lists and that is why in productive usage (which doesn't include tests) that bug didn't appear, because there it's always generators. I don't know what the problem is here? I didn't understand why that error is appearing because bool(generator) is always True and I thought it's always generators so it can't be false. You then said that the tests actually use lists and that solved the problem because bool(list) can be False if that list is empty.

Problems causing the tests to fail have been fixed.

Change 186610 abandoned by Multichill:
[FEAT] More flexible threaded generators

Reason:
No response. This can always be re-opened if you plan to work on it again.

https://gerrit.wikimedia.org/r/186610