Page MenuHomePhabricator

MWException: PCRE needs to be compiled with --enable-unicode-properties in order for MediaWiki to function
Closed, ResolvedPublicSecurity

Description

Error
normalized_message
[{reqId}] {exception_url}   MWException: PCRE needs to be compiled with --enable-unicode-properties in order for MediaWiki to function
exception.trace
from /srv/mediawiki/php-1.40.0-wmf.6/includes/parser/Parser.php(2171)
#0 /srv/mediawiki/php-1.40.0-wmf.6/includes/parser/Parser.php(1629): Parser->handleExternalLinks(string)
#1 /srv/mediawiki/php-1.40.0-wmf.6/includes/parser/Parser.php(712): Parser->internalParse(string)
#2 /srv/mediawiki/php-1.40.0-wmf.6/includes/content/WikitextContentHandler.php(301): Parser->parse(string, Title, ParserOptions, boolean, boolean, NULL)
#3 /srv/mediawiki/php-1.40.0-wmf.6/includes/content/ContentHandler.php(1721): WikitextContentHandler->fillParserOutput(WikitextContent, MediaWiki\Content\Renderer\ContentParseParams, ParserOutput)
#4 /srv/mediawiki/php-1.40.0-wmf.6/includes/content/Renderer/ContentRenderer.php(47): ContentHandler->getParserOutput(WikitextContent, MediaWiki\Content\Renderer\ContentParseParams)
#5 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiParse.php(151): MediaWiki\Content\Renderer\ContentRenderer->getParserOutput(WikitextContent, Title, NULL, ParserOptions)
#6 /srv/mediawiki/php-1.40.0-wmf.6/includes/poolcounter/PoolCounterWorkViaCallback.php(69): ApiParse->{closure}()
#7 /srv/mediawiki/php-1.40.0-wmf.6/includes/poolcounter/PoolCounterWork.php(163): PoolCounterWorkViaCallback->doWork()
#8 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiParse.php(158): PoolCounterWork->execute()
#9 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiParse.php(422): ApiParse->getContentParserOutput(WikitextContent, Title, NULL, ParserOptions)
#10 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiMain.php(1900): ApiParse->execute()
#11 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiMain.php(844): ApiMain->executeAction()
#12 /srv/mediawiki/php-1.40.0-wmf.6/extensions/DiscussionTools/includes/ApiDiscussionToolsTrait.php(134): ApiMain->execute()
#13 /srv/mediawiki/php-1.40.0-wmf.6/extensions/DiscussionTools/includes/ApiDiscussionToolsPreview.php(65): MediaWiki\Extension\DiscussionTools\ApiDiscussionToolsPreview->previewMessage(string, Title, array, array)
#14 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiMain.php(1900): MediaWiki\Extension\DiscussionTools\ApiDiscussionToolsPreview->execute()
#15 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiMain.php(875): ApiMain->executeAction()
#16 /srv/mediawiki/php-1.40.0-wmf.6/includes/api/ApiMain.php(846): ApiMain->executeActionWithErrorHandling()
#17 /srv/mediawiki/php-1.40.0-wmf.6/api.php(90): ApiMain->execute()
#18 /srv/mediawiki/php-1.40.0-wmf.6/api.php(45): wfApiMain()
#19 /srv/mediawiki/w/api.php(3): require(string)
#20 {main}
Impact

40 occurrences over 15 days

Notes

Code from wmf/1.40.0-wmf.6

includes/parser/Parser.php
2157     /**
2158      * Replace external links (REL)
2159      *
2160      * Note: this is all very hackish and the order of execution matters a lot.
2161      * Make sure to run tests/parser/parserTests.php if you change this code.
2162      *
2163      * @param string $text
2164      * @throws MWException
2165      * @return string
2166      */
2167     private function handleExternalLinks( $text ) {
2168         $bits = preg_split( $this->mExtLinkBracketedRegex, $text, -1, PREG_SPLIT_DELIM_CAPTURE );
2169         // @phan-suppress-next-line PhanTypeComparisonFromArray See phan issue #3161
2170         if ( $bits === false ) {
2171             throw new MWException( "PCRE needs to be compiled with "
2172                 . "--enable-unicode-properties in order for MediaWiki to function" );
2173         }

Which comes from 2012 commit 5a6f82c47f414e5f5a71f59ac49f840015d4ad71 for T40249.

Event Timeline

Some traces refer to DiscussionTools but I don't think it is always the case. Maybe preg_split returns false based on certain user input?

Maybe preg_split returns false based on certain user input?

This would happen if the backtrack limit (match limit) is exceeded. See pcre2pattern (search for "Setting match resource limits") and pcre2api (search for "pcre2_set_match_limit" and "pcre2_set_depth_limit") manual pages, as well as php.net.

The regex needs to be changed to avoid running into the limits.

This doesn't seem related to DiscussionTools, we're just using the Parser normally to display some previews.

Out of 107 instances of this error in the last month: https://logstash.wikimedia.org/goto/baeba1b6a102ad4e094ba5d6137e519e
…only 2 involve DiscussionTools: https://logstash.wikimedia.org/goto/bab6a833298491ac38ada10f145b49db

The error can be triggered by just viewing some pages (maybe I shouldn't link them here).

The regexp being used looks like this:

/\[(((?i)bitcoin\:|ftp\:\/\/|ftps\:\/\/|geo\:|git\:\/\/|gopher\:\/\/|http\:\/\/|https\:\/\/|irc\:\/\/|ircs\:\/\/|magnet\:|mailto\:|mms\:\/\/|news\:|nntp\:\/\/|redis\:\/\/|sftp\:\/\/|sip\:|sips\:|sms\:|ssh\:\/\/|svn\:\/\/|tel\:|telnet\:\/\/|urn\:|worldwind\:\/\/|xmpp\:|\/\/)(?:[0-9.]+|\[(?i:[0-9a-f:.]+)\]|[^][<>"\x00-\x20\x7F\p{Zs}\x{FFFD}])[^][<>"\x00-\x20\x7F\p{Zs}\x{FFFD}]*)\p{Zs}*([^\]\x00-\x08\x0a-\x1F\x{FFFD}]*?)\]/Su
matmarex set Security to Software security bug.Oct 31 2022, 4:21 AM
matmarex added projects: Security, Security-Team.
matmarex changed the visibility from "Public (No Login Required)" to "Custom Policy".
matmarex changed the subtype of this task from "Production Error" to "Security Issue".

Not sure if this is a security issue, considering that it's failing "cleanly" without running out of memory, but just in case.

Example input causing the problem:

[http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
a]

(depending on your configuration, you may be able to reduce the input or you may need to extend it)

Note the line break after the URL, this is required to reproduce.

The original regexp's runtime is quadratic. The following should be equivalent, but seems to run in linear time, by avoiding the lazy quantifier:

\[(((?i)bitcoin\:|ftp\:\/\/|ftps\:\/\/|geo\:|git\:\/\/|gopher\:\/\/|http\:\/\/|https\:\/\/|irc\:\/\/|ircs\:\/\/|magnet\:|mailto\:|mms\:\/\/|news\:|nntp\:\/\/|redis\:\/\/|sftp\:\/\/|sip\:|sips\:|sms\:|ssh\:\/\/|svn\:\/\/|tel\:|telnet\:\/\/|urn\:|worldwind\:\/\/|xmpp\:|\/\/)(?:[0-9.]+|\[(?i:[0-9a-f:.]+)\]|[^][<>"\x00-\x20\x7F\p{Zs}\x{FFFD}])[^][<>"\x00-\x20\x7F\p{Zs}\x{FFFD}]*)\p{Zs}*(?:\]|([^\]\x00-\x08\x0a-\x1F\x{FFFD}]+)\])

I feel like I don't completely understand the problem, though, so this solution might be wrong as well.

Thank you so much @PleaseStand for pointing out the backtracking limit and @matmarex for the reproducible use case. I gave a try to the example on https://regex101.com/ and it breaks with the message catastrophic backtracking. The site has a regex debugger which helps you replay the matching steps one by one, it halts after 200k steps. It has an infinite loop on the last part of the regex )\p{Zs}*([^\]\x00-\x08\x0a-\x1F\x{FFFD}]*?)\]

Without the extra newlines, it matches in 31 steps. I guess it is related to \p{Zs}*? I have introduced that code back when I was volunteer in 2011 commit 176f91596c80cf5e41484c7377d809d09f0ecbf7 which was to recognize ideographic spaces used in Chinese (T21052) (which also might have caused the parser to be slightly slower).

I have no idea how that can be fixed though.

What we could do at least is when preg_split returns false we could throw an exception with the content of preg_last_error_msg(). Something such as:

 private function handleExternalLinks( $text ) {
     $bits = preg_split( $this->mExtLinkBracketedRegex, $text, -1, PREG_SPLIT_DELIM_CAPTURE );
     // @phan-suppress-next-line PhanTypeComparisonFromArray See phan issue #3161
     if ( $bits === false ) {
         throw new MWException( "PCRE failure: " . preg_last_error_msg() );
     }
}

Not sure if this is a security issue, considering that it's failing "cleanly" without running out of memory, but [I'm making this task private] just in case.

I think now that this wasn't necessary. The regexp engine is hitting the safety limit intended to prevent site stability issues in this scenario, and it's is failing exactly as it should. (We just have a misleading error message on the MediaWiki side.)

I was worried that it would allow hard-to-undo vandalism, if you could just make any page unviewable, but it turns out that a) the interface is still shown despite the error b) on Wikimedia wikis, the error also prevents saving edits with the problematic patterns (due to SpamBlacklist parsing the page before it's allowed to be saved). I guess the limit has been lowered when we upgraded to PHP 7.4, and the existing affected pages were created before that.

Parsoid doesn't seem affected either.

I'd like to make the task public again (I'm apparently not allowed to do it), because I have a very nice one-character fix for the problem :)

(Separately, we should also fix the error message to say something generic about a regular expression error.)

--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -499,7 +499,7 @@ class Parser {
 		$this->urlUtils = $urlUtils;
 		$this->mExtLinkBracketedRegex = '/\[(((?i)' . $this->urlUtils->validProtocols() . ')' .
 			self::EXT_LINK_ADDR .
-			self::EXT_LINK_URL_CLASS . '*)\p{Zs}*([^\]\\x00-\\x08\\x0a-\\x1F\\x{FFFD}]*?)\]/Su';
+			self::EXT_LINK_URL_CLASS . '*)\p{Zs}*([^\]\\x00-\\x08\\x0a-\\x1F\\x{FFFD}]*)\]/Su';
 
 		$this->magicWordFactory = $magicWordFactory;

In external link syntax like [http://example.org Example], the space between link target and label is technically optional when the label starts with characters not allowed in the URL, such as [http://example.org<b>Example</b>].

This is done with a regexp that matches a required opening bracket, a required URL, optional spaces, optional label, and a required closing bracket. The real regexp is messy to handle various characters allowed in each part, but for illustration purposes, it's basically the same as \[([^\]\s\<]+) *([^\]\s]*?)\].

When given input that looks like a link, but doesn't have the closing bracket, the regexp engine (PCRE) would therefore attempt matching every possible combination of target and label lengths before failing:

Input: [http://example.org

TargetLabel
http://example.org
http://example.or
http://example.org
http://example.o
http://example.or
http://example.org
http://example.
http://example.o
http://example.or
http://example.org
http://example
http://example.
http://example.o
http://example.or
http://example.org

…and so on. This would take (1 + 2 + 3 + … + 18) = 171 steps to fail in this example, or N * (N+1) / 2 steps in general. For sufficiently large inputs this hits a limit designed to protect against exactly this situation, and the whole wikitext parser crashes.

(To hit the pathological case, it's also required for a ] to appear somewhere later in the input, otherwise PCRE would detect that a match is never possible and exit before doing any of the above.)

Live example: https://regex101.com/debugger/?regex=%5C%5B%28%5B%5E%5C%5D%5Cs%5C%3C%5D%2B%29%20%2A%28%5B%5E%5C%5D%5Cs%5D%2A%3F%29%5C%5D&testString=%5Bhttp%3A%2F%2Fexample.org%0A%5D

We can fix it by changing the lazy quantifier *? to the greedy *. This is correct for this regexp only because the label isn't allowed to contain ']' (otherwise, the first external link on the page would consume all of the content until the last external link as its label).

This allows PCRE to only consider the cases where the label has the maximum possible length:

TargetLabel
http://example.org
http://example.org
http://example.org
http://example.org
http://example.org

…and so on. Only 18 steps, or N steps in general.

Live example: https://regex101.com/debugger/?regex=%5C%5B%28%5B%5E%5C%5D%5Cs%5C%3C%5D%2B%29%20%2A%28%5B%5E%5C%5D%5Cs%5D%2A%29%5C%5D&testString=%5Bhttp%3A%2F%2Fexample.org%0A%5D

I think this bug has been present since 2004, when external link parsing was rewritten in badf11ffe6 (SVN r4579).

Mstyles subscribed.

It is unclear to me why this has been tagged as a security issue. I don't see any factors that would cause the security team to be concerned

@Mstyles Yes, I was overly cautious (see my previous comment T321467#8371826). Please make it public, as I do not have the permissions to do it.

Aklapper changed the visibility from "Custom Policy" to "Public (No Login Required)".

Change 855019 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Parser: Fix quadratic regexp edge case

https://gerrit.wikimedia.org/r/855019

Change 855021 had a related patch set uploaded (by Bartosz Dziewoński; author: Bartosz Dziewoński):

[mediawiki/core@master] Tweak misleading error message about PCRE

https://gerrit.wikimedia.org/r/855021

sbassett changed the task status from Open to In Progress.Nov 9 2022, 4:34 PM
sbassett triaged this task as Medium priority.
sbassett edited projects, added SecTeam-Processed; removed Security-Team.

Change 855021 merged by jenkins-bot:

[mediawiki/core@master] Tweak misleading error message about PCRE

https://gerrit.wikimedia.org/r/855021

Change 855019 merged by jenkins-bot:

[mediawiki/core@master] Parser: Fix quadratic regexp edge case

https://gerrit.wikimedia.org/r/855019

matmarex claimed this task.