API requests when processing pages with XMLDumpPageGenerator
Open, Needs TriagePublicBUG REPORT


Steps to replicate the issue:

  • iterate pages with XMLDumpPageGenerator()

What happens?:
For each page an API request is done to check if text can be written by bot, when page is constructed from XML entry:
982 -> page.text = entry.text

This defeats the purpose of off-line processing.

> /home/pc/python/core/pywikibot/pagegenerators/
969  	    def __next__(self) ->
970  	        """Get next Page."""
971  	        while True:
972  	            entry = next(self.parser)
973  	            if self.skipping:
974  	                if entry.title < self.start:
975  	                    continue
976  	                self.skipping = False
977  	            page = pywikibot.Page(, entry.title)
978  	            if page.namespace() not in self.namespaces:
979  	                continue
980  	            if not self.text_predicate or self.text_predicate(entry.text):
981  	                if self.content:
982  ->	                    page.text = entry.text
983  	                return page

528  	    @text.setter
529  	    def text(self, value: str | None):
530  	        """Update the current (edited) wikitext.
532  	        :param value: New value or None
533  	        """
534  	        try:
535  ->	            self.botMayEdit()  # T262136, T267770
536  	        except Exception as e:
537  	            # dry tests aren't able to make an API call
538  	            # but are rejected by an Exception; ignore it then.
539  	            if not str(e).startswith('DryRequest rejecting request:'):
540  	                raise

What should have happened instead?:
No access to network, until a page is saved.

Event Timeline

I think it should be checked with get() method only whether bot may edit or not. Or with latest_revision. It should be ensured that the latest revision is used for the check and not the current Page.text.

It seems OK to me how it is checked today, it is checked via template checking and I did not see reference to current text.

I do not understand why it is checked there, as it is also checked in
It seems redundant to me, but maybe there is some edge case I cannot see.