Page MenuHomePhabricator

Generators can be iterated over multiple times
Closed, ResolvedPublic

Description

Page generators can be iterated over multiple times when expected to be empty and stop after the first time. The following iterations can contain erratic results.

For example: It is possible to iterate over a generator with a for loop and then iterate over that same generator with a later for loop.

If the generator is called multiple times without step specified, the generator repeats itself if iterated over again. If step is specified, the generator shifts results erratically when called multiple times.

Below is REPL output from the test:test site using PrefixingPageGenerator().

>>> ppg = PrefixingPageGenerator('a', step=2, total=3)
>>> for _ in range(3):
...     print list(ppg)
...     
[Page(A), Page(AAA), Page(AF Test)]
[Page(AF Test), Page(AKlapper2), Page(API output)]
[Page(AKlapper2), Page(API output), Page(API page move test)]

Expected output would be [] after the first one.

Event Timeline

Daviskr raised the priority of this task from to Medium.
Daviskr updated the task description. (Show Details)
Daviskr added a project: Pywikibot.
Daviskr subscribed.

It has to do with the hairy business of iterators versus generators in python:

## current code

class IteratorExample(object):
	def __iter__(self):
		yield 1
		
i = IteratorExample()
print list(i)
print list(i) # this is confusing

## 'typical' use of __iter__

class IteratorExample(object):
	def __init__(self):
		self.returnvalues = [1]
	def __iter__(self):
		return self
	def next(self):
		try:
			return self.returnvalues.pop()
		except:
			raise StopIteration()

	
i = IteratorExample()
print list(i)
print list(i) # this now makes sense

## how to go from (1) to (2) with minimal changes?

class IteratorExample(object):
	def __init__(self):
		self.gen = self._gen()
	def _gen(self):
		yield 1
	def __iter__(self):
		return self
	def next(self):
		return self.gen.next()


i = IteratorExample()
print list(i)
print list(i) # this now makes sense, but the boilerplate code is... meh.

Basically, list(ppg) actually runs list(ppg.iter()), and iter() returns a new generator every time it's called.

Reading a bit more about it, I think the issue is that we save state in data.api.QueryGenerator.__iter__, instead of in QueryGenerator itself.

https://github.com/wikimedia/pywikibot-core/blob/fb5af89cbdb465e207e9f21a60c17aaf2c500530/pywikibot/data/api.py#L1449

This doesnt quite solve the bug, but it almost does by making iter stateless . that should mean the iterator's second run is identical to the first, unless data has changed.

https://gerrit.wikimedia.org/r/#/c/176013/

Xqt claimed this task.
Xqt subscribed.

Does not occure anymore:

>>> import pwb, pywikibot as py
>>> from pywikibot.pagegenerators import PrefixingPageGenerator
>>> ppg = PrefixingPageGenerator('a', step=2, total=3)

WARNING: __main__:2: DeprecationWarning: step argument of pywikibot.pagegenerators.PrefixingPageGenerator is deprecated.

>>> for _ in range(3):
	print list(ppg)

	
[Page(A), Page(A!B!C Titans Berg. Land), Page(A$)]
[Page(A), Page(A!B!C Titans Berg. Land), Page(A$)]
[Page(A), Page(A!B!C Titans Berg. Land), Page(A$)]

or with step parameter:

>>> import pwb, pywikibot as py
>>> from pywikibot.pagegenerators import PrefixingPageGenerator
>>> ppg = PrefixingPageGenerator('a', total=3)
>>> from pywikibot import config
>>> config.step = 2
>>> for _ in range(3):
	print list(ppg)

	
[Page(A), Page(A!B!C Titans Berg. Land), Page(A$)]
[Page(A), Page(A!B!C Titans Berg. Land), Page(A$)]
[Page(A), Page(A!B!C Titans Berg. Land), Page(A$)]
>>>