Page MenuHomePhabricator

pagegenerators.PreloadingGenerator should be disabled for -liverecentchanges
Closed, ResolvedPublic

Description

pagegenerators.PreloadingGenerator (as well as Site.preloadpages) should aleways be disabled for -liverecentchanges option. It is exceedingly superfluous to make a bulk load for -liverecentchanges which waits for an event. Otherwise the process waits until the buffer is full to start the operation. This is counterproductive because that option intends processing the event as soon as possible.

Event Timeline

Xqt triaged this task as Medium priority.May 15 2016, 10:37 AM

Change 288919 had a related patch set uploaded (by Xqt):
[IMPR] Disable PreloadingGenerator when -liverecentchanges is selected

https://gerrit.wikimedia.org/r/288919

For the general problem wrt in PreloadingGenerator, using groupsize=1 solves the problem. Pages will be emitted immediately. This was previously called step, but was renamed in 65b75413.

Also any solution to this bug must not be designed only for -liverecentchanges ; any page generators could be emitting changes 'live' from the wiki, and we already have another one RepeatingGenerator.

The bugs we need to solve are

  1. within GeneratorFactory, when PreloadingGenerator is used when -liverecentchanges (or similar) has been specified.

e.g. a very specific problem in getCombinedGenerator, where

if self.articlefilter_list:
    dupfiltergen = RegexBodyFilterPageGenerator(
        PreloadingGenerator(dupfiltergen), self.articlefilter_list)

We need to not call PreloadingGenerator there if -liverecentchanges is activated, or call it with groupsize=1.
IMO the simplest solution is for GeneratorFactory to have an attribute groupsize initially None, which -liverecentchanges would set to 1.

  1. scripts often call PreloadingGenerator after the factory has given them a page generator, any thus they will inadvertently be trying to preload liverecentchanges.

My preferred solution for that is that the script tells the GeneratorFactory, in the constructor, that the script believes preloading would be beneficial, and the script does not add its own PreloadingGenerator.
(which is already implemented in https://gerrit.wikimedia.org/r/#/c/172577/2 )

Change 288919 merged by jenkins-bot:
[pywikibot/core@master] [IMPR] Provide preloading via GeneratorFactory.getCombinedGenerator()

https://gerrit.wikimedia.org/r/288919