Author: nephele
Description:
Fix broken regexp in SearchUpdate.php (patch to r49794)
If an article contains a "<" symbol and there is no subsequent ">" symbol anywhere in the article, the si_text field for that article in the searchindex table ends up completely empty -- even the text in the article before the "<" symbol is wiped out. It is therefore impossible to search on any of the article's contents.
For example, http://www.uesp.net/wiki/UESPWiki:Mirror_Plan is currently triggering this bug; si_text is being set to ''. Although UESP is currently running MW1.10, the same bug occurs if the article is added to a test wiki running r49794.
The basic problem is an incorrect pair of parentheses in a preg_replace expression in SearchUpdate.php::doUpdate(). The attached patch file removes those parentheses; I also did some secondary cleanup of the expression by deleting some redundant chunks ("[A-Za-z0-9]*\\s*" is all covered equally well by "[^>]*?", and the simpler expression doesn't mislead editors). The revised regexp successfully processes UESPWiki:Mirror_Plan, and also successfully processes some test pages containing html tags.
Version: 1.16.x
Severity: normal
Attached: