Page MenuHomePhabricator

importdump.php regular expression is too large
Closed, DeclinedPublic

Description

Author: radek.marik

Description:
Running on XAMPP 2.5.
I tried to import 6500 pages exported from MediaWiki 1.9. After a while, roughly 3500 pages, the importdump.php reports many times:
Warning: preg_match(): Compilation failed: regular expression is too large at offset 29149 in C:\xampp\htdocs\wiki\includes\Preprocessor_DOM.php on line 205.

Additional notes:
The issue can be tracked to the variable $xmlishElements receiving its value from $this->parser->getStripList() on line 81. The parser accumulates more and more hooks with the same key ("ask") remembering all of them because it is implemented as a normal array not as as a associative array.


Version: 1.12.x
Severity: major
OS: Windows XP
Platform: PC

Details

Reference
bz14961

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:11 PM
bzimport set Reference to bz14961.

I've hit this before. Extensions that add tags to the strip list there (ie: Cite with <ref>) tend to cause that regex to get too large. Might be worth breaking core parsing of those apart from extensions.

Hmmmm, I thought we fixed this sort of problem previously? (Maybe that was hooks?) Perhaps some state isn't getting cleared properly...

Assigned to Tim, current expert on wiki dumps.

Code is not in the newer 1.13/1.14a

Was fixed by Brion in r32133. The reporter is using 1.12 which was branched at r31056. Please update to 1.13.