Page MenuHomePhabricator

PHP segmentation fault when analyzing Afghanistan on English Wikipedia
Closed, ResolvedPublic

Description

When IABot is analyzing the page Afghanistan on the English Wikipedia, PHP experiences a segmentation fault.

This fault occurs on Linux, Mac, and Windows.

Event Timeline

Cyberpower678 triaged this task as Unbreak Now! priority.Sep 10 2016, 3:46 PM

It appears to be caused by

$text = preg_replace( '/\<\s*nowiki(?:.|\n)*?\<\/nowiki\s*\>/i', "", $text );

in the filterText function.

Regex 101 is reporting that some kind of catastrophic backtracking is occurring during the matching process. It appears to be a "runaway regex".

Looks like a spurious </nowiki> tag in the article with no matching opening <nowiki>

Looks like a spurious </nowiki> tag in the article with no matching opening <nowiki>

That alone should cause a no match. I'm debugging the regex now.

The regex itself is doing the intended job, but the article is to large to handle. It picks up the spurious tag and the tries to find the closing tag, Firstly because the spurious tag is using invalid syntax. At some point it just runs too long and segfaults.

I think this can easily be fixed by verifying the presence of both tags before running this regex.

Grunt, no more segfaults now, but I get a compilation error somewhere else. >:(