Page MenuHomePhabricator

language converter can be tricked into replacing text inside tags by adding a lot of junk after the rule definition (CVE-2017-8814)
Closed, ResolvedPublic

Description

Found well testing a new solution to T119158. The regex in LanguageConverter::autoConvert(), which matches parts of the HTML document not to convert, can hit pcre.backtrack_limit on some input (default in php is 100,000 but it is configurable). The regex fails to match anything after the point it hits the backtrack limit, so language conversion substitutions are done on the entire document, instead of just the part that it is supposed to be done.

To exploit: Create a rule followed by slightly more than pcre.backtrack_limit characters (without any newlines). After that, the substitution rule will be applied everywhere, including within tags.

E.g. Create a page with the following form:

-{H|big=>sr-el:script}- foo <X repeated at least 100000 times>

<big>alert(1)</big>

And then view it in the sr-el variant

(For testing, its easier if you set pcre.backtrack_limit to a smaller number than default)

Patches

, , ,

Event Timeline

Bawolff raised the priority of this task from to Needs Triage.
Bawolff updated the task description. (Show Details)
Bawolff changed the visibility from "Public (No Login Required)" to "Custom Policy".
Bawolff changed the edit policy from "All Users" to "Custom Policy".
Bawolff changed Security from None to Software security bug.
Bawolff added subscribers: Bawolff, csteipp.

On second thought, I think its better in the case we do hit the backtrack limit, to simply not do language conversion on the page, and log the error with a high severity. [See new version of patch]

With the changes to the regex, it should be much more difficult for someone who is not malicious to trigger this error. The most likely way to trigger it, would be a very long <source> listing [Since <source> turns into a <pre> with a lot of children], or a <code> tag with like 50,000 children. Seems unlikely for a non-malicious person to do by accident.


To summarize, patch does 2 things:

  • Optimize regexes to minimize the amount of backtracking (Primarily by using possessive quantifiers where possible)
  • Add a check that regex matches to the end, so if we do hit the backtrack limit, we can detect the situation and respond appropriately.

I tested this patch on the contents of [[zh:贝拉克·奥巴马]] (Obama from zhwiki). No noticeable performance difference.

On second thought, I think its better in the case we do hit the backtrack limit, to simply not do language conversion on the page, and log the error with a high severity. [See new version of patch]

With the changes to the regex, it should be much more difficult for someone who is not malicious to trigger this error. The most likely way to trigger it, would be a very long <source> listing [Since <source> turns into a <pre> with a lot of children], or a <code> tag with like 50,000 children. Seems unlikely for a non-malicious person to do by accident.


To summarize, patch does 2 things:

  • Optimize regexes to minimize the amount of backtracking (Primarily by using possessive quantifiers where possible)
  • Add a check that regex matches to the end, so if we do hit the backtrack limit, we can detect the situation and respond appropriately.

When deploying this patch, should probably keep an eye on logstash for the languageconverter, just to make sure there is no sudden spike in "Hit pcre.backtrack_limit in..." messages. I don't think there should be, but just in case.

I think we should update the patch to also check for 0 vs. false, and handle the false/error case appropriately. From http://php.net/manual/en/function.preg-match.php, "preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred."

Ftr, @Bawolff seems to have found cases in older php versions where 0 was returned on errors also, so reducing the chances of hitting the backtrack limit is also useful.

Patches:

, , ,

Reedy lowered the priority of this task from High to Medium.Jul 10 2017, 9:25 PM




(As of nov 13)

[22:42] bawolff !log deployed patch T124404

MoritzMuehlenhoff renamed this task from language converter can be tricked into replacing text inside tags by adding a lot of junk after the rule definition to language converter can be tricked into replacing text inside tags by adding a lot of junk after the rule definition (CVE-2017-8814).Nov 14 2017, 9:30 AM
Reedy changed the visibility from "Custom Policy" to "Public (No Login Required)".Nov 15 2017, 12:06 AM
Reedy changed the edit policy from "Custom Policy" to "All Users".

Change 391409 merged by jenkins-bot:
[mediawiki/core@fundraising/REL1_27] XSS in langconverter when regex hits pcre.backtrack_limit

https://gerrit.wikimedia.org/r/391409

Change 391454 had a related patch set uploaded (by Reedy; owner: Brian Wolff):
[mediawiki/core@master] SECURITY: XSS in langconverter when regex hits pcre.backtrack_limit

https://gerrit.wikimedia.org/r/391454

Change 391454 merged by Reedy:
[mediawiki/core@master] SECURITY: XSS in langconverter when regex hits pcre.backtrack_limit

https://gerrit.wikimedia.org/r/391454

Change 391733 had a related patch set uploaded (by Chad; owner: Brian Wolff):
[mediawiki/core@wmf/1.31.0-wmf.8] SECURITY: XSS in langconverter when regex hits pcre.backtrack_limit

https://gerrit.wikimedia.org/r/391733

Change 391733 merged by jenkins-bot:
[mediawiki/core@wmf/1.31.0-wmf.8] SECURITY: XSS in langconverter when regex hits pcre.backtrack_limit

https://gerrit.wikimedia.org/r/391733