Page MenuHomePhabricator

MemoryError when replacing using xml dump
Closed, InvalidPublic

Description

When executing

py pwb.py replace -exceptinsidetag:ref -exceptinsidetag:hyperlink -file:"cswiki-20161020-pages-articles.xml" -summary:"Náhrada za správný tvar dle újč příručky. Viz ŽOPP z 18.10.2016" "Himaláj" "Himálaj" "Himáláj" "Himálaj"

replace.py crashes with

Traceback (most recent call last):
File "pwb.py", line 255, in <module>
if not main():
File "pwb.py", line 249, in main
run_python_file(filename, [filename] + args, argvu, file_package)
File "pwb.py", line 121, in run_python_file
main_mod.__dict__)
File ".\replace.py", line 1154, in <module>
main()
File ".\replace.py", line 1145, in main
bot.run()
File ".\replace.py", line 706, in run
for page in self.generator:
File "  \core\pywikibot\pagegenerators.py", line 1886, in PreloadingGenerator
for page in generator:
File "  \core\pywikibot\pagegenerators.py", line 1309, in TextfilePageGenerator
for linkmatch in pywikibot.link_regex.finditer(f.read()):
File "C:\Python3\lib\codecs.py", line 698, in read
return self.reader.read(size)
File "C:\Python3\lib\codecs.py", line 493, in read
newdata = self.stream.read()
MemoryError
<class 'MemoryError'>

Running on:

Pywikibot: [https] r-pywikibot-core.git (d1e637c, g7542, 2016/10/17, 17:53:14, n/a)
Release version: 3.0-dev
requests version: 2.11.1
  cacerts: C:\Python3\lib\site-packages\requests\cacert.pem
    certificate test: ok
Python: 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:01:18) [MSC v.1900 32 bit (Intel)]

Event Timeline

Haven't seen such an error for a long time.

How big is your xml file?
Was the bot working for a while or breaks it at the beginning?

It was one of the regular cswiki dumps, so around 600 Mb. I think it crashed right at the start (the reason why I only think is that I went through complete switch from win to tux with drive reformatting a day ago so I cant reproduce it right away).

did not happen again, if someone else gets this error, feel free to reopen