Page MenuHomePhabricator

0001-Parser-Fix-quadratic-regexp-edge-case.patch

Authored By
matmarex
Nov 5 2022, 12:44 AM
Size
3 KB
Referenced Files
None
Subscribers
None

0001-Parser-Fix-quadratic-regexp-edge-case.patch

From 8b836ad04f5a2a0bb5510d57d9b1fbe49638cc2e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Bartosz=20Dziewo=C5=84ski?= <matma.rex@gmail.com>
Date: Sat, 5 Nov 2022 00:23:56 +0100
Subject: [PATCH] Parser: Fix quadratic regexp edge case
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In external link syntax like `[http://example.org Example]`,
the space between link target and label is technically optional
when the label starts with characters not allowed in the URL,
such as `[http://example.org<b>Example</b>]`.
This is done with a regexp that matches a required opening bracket,
a required URL, optional spaces, optional label, and a required
closing bracket. The real regexp is messy to handle various characters
allowed in each part, but for illustration purposes, it's basically
the same as `\[([^\]\s\<]+) *([^\]\s]*?)\]`.
When given input that looks like a link, but doesn't have the closing
bracket, the regexp engine (PCRE) would therefore attempt matching
every possible combination of target and label lengths before failing:
Input: [http://example.org
| Target | Label |
| ------------------ | ----- |
| http://example.or | g |
| http://example.o | r |
| http://example.o | rg |
| http://example. | o |
| http://example. | or |
| http://example. | org |
| http://example | . |
| http://example | .o |
| http://example | .or |
| http://example | .org |
| http://exampl | e |
| http://exampl | e. |
| http://exampl | e.o |
| http://exampl | e.or |
| http://exampl | e.org |
…and so on. This would take (1 + 2 + 3 + … + 18) = 171 steps to fail
in this example, or `N * (N+1) / 2` steps in general. For sufficiently
large inputs this hits a limit designed to protect against exactly
this situation, and the whole wikitext parser crashes.
(To hit the pathological case, it's also required for a `]` to appear
somewhere later in the input, otherwise PCRE would detect that a match
is never possible and exit before doing any of the above.)
Live example: https://regex101.com/debugger/?regex=%5C%5B%28%5B%5E%5C%5D%5Cs%5C%3C%5D%2B%29%20%2A%28%5B%5E%5C%5D%5Cs%5D%2A%3F%29%5C%5D&testString=%5Bhttp%3A%2F%2Fexample.org%0A%5D
We can fix it by changing the lazy quantifier `*?` to the greedy `*`.
This is correct for this regexp only because the label isn't allowed
to contain ']' (otherwise, the first external link on the page would
consume all of the content until the last external link as its label).
This allows PCRE to only consider the cases where the label has the
maximum possible length:
| Target | Label |
| ------------------ | ----- |
| http://example.or | g |
| http://example.o | rg |
| http://example. | org |
| http://example | .org |
| http://exampl | e.org |
…and so on. Only 18 steps, or `N` steps in general.
Live example: https://regex101.com/debugger/?regex=%5C%5B%28%5B%5E%5C%5D%5Cs%5C%3C%5D%2B%29%20%2A%28%5B%5E%5C%5D%5Cs%5D%2A%29%5C%5D&testString=%5Bhttp%3A%2F%2Fexample.org%0A%5D
I think this bug has been present since 2004, when external link
parsing was rewritten in badf11ffe6 (SVN r4579).
Change-Id: I993a10d9a90ab28cce61eba6beabee8a06a2d562
---
includes/parser/Parser.php | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/includes/parser/Parser.php b/includes/parser/Parser.php
index 1b0110d1ef6..42d796370b9 100644
--- a/includes/parser/Parser.php
+++ b/includes/parser/Parser.php
@@ -499,7 +499,7 @@ class Parser {
$this->urlUtils = $urlUtils;
$this->mExtLinkBracketedRegex = '/\[(((?i)' . $this->urlUtils->validProtocols() . ')' .
self::EXT_LINK_ADDR .
- self::EXT_LINK_URL_CLASS . '*)\p{Zs}*([^\]\\x00-\\x08\\x0a-\\x1F\\x{FFFD}]*?)\]/Su';
+ self::EXT_LINK_URL_CLASS . '*)\p{Zs}*([^\]\\x00-\\x08\\x0a-\\x1F\\x{FFFD}]*)\]/Su';
$this->magicWordFactory = $magicWordFactory;
--
2.28.0.windows.1

File Metadata

Mime Type
text/plain
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
9872855
Default Alt Text
0001-Parser-Fix-quadratic-regexp-edge-case.patch (3 KB)

Event Timeline