unconnected_pages generator doesn't seem to return all pages
One of my bots didn't run for a while so I had to catch up at the unconnected pages (see ). I use the -unconnected commandline option ($985 ). I noticed that if I run "python -lang:nl -family:wikipedia -namespaces:0 -unconnectedpages" that I don't get all pages.

The generator uses site.unconnected_pages ($6803 ) which using$1915 gets a api.PageGenerator ($2971 ) which is a subclass of the QueryGenerator ($2568 ).

I think the continue handling is going wrong here. Have a look at . The qpoffset is used for the paging and the continue parameter. I don't think we're using the qpoffset

More info at

I think the continue handling is going wrong here

That's very likely. I recently tried to fix a continue handling issue in PropertyGenerator, see T196876, but similar issues might also exist in other generators. If so, we should look for a more general solution.

If all the generators listed on follow the same logic in Pywikibot as the unconnected pages one, than probably all of them have the same issue. Maybe make a new subclass QueryPageGenerator that wraps around ?

qpoffset is used, the issue is same as T173293.

The last request that fetches data is<wikipedia:nl->'/w/api.php?gqppage=UnconnectedPages&prop=info|imageinfo|categoryinfo&inprop=protection&iiprop=timestamp|user|comment|url|size|sha1|metadata&iilimit=max&generator=querypage&action=query&indexpageids=&continue=gqpoffset||userinfo&gqplimit=500&meta=userinfo&uiprop=blockinfo|hasmsg&maxlag=5&format=json&gqpoffset=10000'>



{'batchcomplete': '', 'query': {'querypage': {'name': 'UnconnectedPages'}}, 'limits': {'imageinfo': 500}}

and no more data can be fetched.
I do not know what sets the limit of 10000 in the API.

So you get all the pages with namespace=0 that are in the yielded 10500 pages.