Page MenuHomePhabricator

Unknown modifier 'R': [/^page\-User\:BeneBot.+/RfD\-open/text$/] in /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php
Closed, ResolvedPublic

Description

Translate stringmangler::StringMatcher.php seem to be causing a warning and an exception:

Warnings:
https://logstash.wikimedia.org/goto/c47d729970d829dffc989425d5dae9e4
Exceptions:
https://logstash.wikimedia.org/goto/766eb6068b9b0b255ee1303dd024109a

Example:

{
  "_index": "logstash-2018.08.16",
  "_type": "mediawiki",
  "_id": "AWVC9LHGoOODFPKvE6HW",
  "_version": 1,
  "_score": null,
  "_source": {
    "exception": {
      "trace": "#0 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php(100): MWExceptionHandler::handleError(integer, string, string, integer, array, array)\n#1 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/MessageGroups.php(620): StringMatcher->match(string)\n#2 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/messagegroups/AggregateMessageGroup.php(51): MessageGroups::expandWildcards(array)\n#3 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/utils/MessageGroupStats.php(294): AggregateMessageGroup->getGroups()\n#4 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/utils/MessageGroupStats.php(376): MessageGroupStats::expandAggregates(AggregateMessageGroup)\n#5 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/utils/MessageGroupStats.php(280): MessageGroupStats::forItemInternal(array, AggregateMessageGroup, string)\n#6 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/utils/MessageGroupStats.php(110): MessageGroupStats::forLanguageInternal(string)\n#7 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/api/ApiQueryLanguageStats.php(24): MessageGroupStats::forLanguage(string)\n#8 /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/api/ApiStatsQuery.php(25): ApiQueryLanguageStats->getData()\n#9 /srv/mediawiki/php-1.32.0-wmf.16/includes/api/ApiQuery.php(249): ApiStatsQuery->execute()\n#10 /srv/mediawiki/php-1.32.0-wmf.16/includes/api/ApiMain.php(1579): ApiQuery->execute()\n#11 /srv/mediawiki/php-1.32.0-wmf.16/includes/api/ApiMain.php(533): ApiMain->executeAction()\n#12 /srv/mediawiki/php-1.32.0-wmf.16/includes/api/ApiMain.php(504): ApiMain->executeActionWithErrorHandling()\n#13 /srv/mediawiki/php-1.32.0-wmf.16/api.php(83): ApiMain->execute()\n#14 /srv/mediawiki/w/api.php(3): include(string)\n#15 {main}",
      "code": 0,
      "file": "/srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php:100",
      "message": "PHP Warning: Unknown modifier 'R': [/^page\\-User\\:BeneBot.+/RfD\\-open/text$/]",
      "class": "ErrorException"
    },
    "server": "www.wikidata.org",
    "unique_id": "W3V@IApAIDUAAI-DOD4AAAAO",
    "level": "WARNING",
    "ip": "10.64.16.22",
    "wiki": "wikidatawiki",
    "channel": "error",
    "exception_id": "W3V@IApAIDUAAI-DOD4AAAAO",
    "mwversion": "1.32.0-wmf.16",
    "message": "[W3V@IApAIDUAAI-DOD4AAAAO] /w/api.php?action=query&format=json&meta=languagestats&lslanguage=hu   ErrorException from line 100 of /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php: PHP Warning: Unknown modifier 'R': [/^page\\-User\\:BeneBot.+/RfD\\-open/text$/]",
    "type": "mediawiki",
    "normalized_message": "[{exception_id}] {exception_url}   ErrorException from line 100 of /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php: PHP Warning: Unknown modifier 'R': [/^page\\-User\\:BeneBot.+/RfD\\-open/text$/]",
    "url": "/w/api.php?action=query&format=json&meta=languagestats&lslanguage=hu",
    "caught_by": "mwe_handler",
    "tags": [
      "syslog",
      "es",
      "es"
    ],
    "reqId": "W3V@IApAIDUAAI-DOD4AAAAO",
    "referrer": "https://www.wikidata.org/w/index.php?title=Special:Translate&group=page-Help%3AContents&action=page&filter=&language=hu",
    "exception_url": "/w/api.php?action=query&format=json&meta=languagestats&lslanguage=hu",
    "@timestamp": "2018-08-16T13:37:37.000Z",
    "http_method": "GET",
    "@version": 1,
    "host": "mw1341",
    "shard": "s8"
  },
  "fields": {
    "@timestamp": [
      1534426657000
    ]
  },
  "sort": [
    1534426657000
  ]
}

Started on 14 august at 21h. Filing it as security in case it could cause issues due to improper escaped strings.

Event Timeline

jcrespo created this task.Aug 16 2018, 1:52 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 16 2018, 1:52 PM

The cause is that a page which contains * in the title has been added to an aggregate message group. StringMatcher looks for * in middle of string and converts that to regular expressions. It uses preg_quote but fails to supply the second parameter that is required to quote /. Second issue is that page names coming from database should not be used as patterns at all, since patterns are not allowed there currently. Patterns should continue working when coming from yaml files (not used at Wikimedia production). The design decision of 'treating anything with a * in it as a pattern with no other way to differentiate' is now causing issues.

First issue is trivial to fix, just add '/' as second parameter to line 78 of StringMatcher.php. I tried to check with shell.php whether e modifier is supported, but it seems it has been disabled. In any case it is a good idea to patch this quickly. Since everything else is quoted, users are not able to build arbitrary regexps.

The second issue is not as easy. I could think of different options:

  1. When creating message groups, we could add a field/method source and have AggregateMessageGroup skip line 51 (MessageGroups::expandWildcards) if source is database.
  2. When creating message groups from yaml, mark the list of subgroups in a special way and have line 51 above executed only for those.
  3. When creating message groups from database, mark the list of subgroups in a special way which the StringMatcher will not look for patterns and treat them as exact matches.
  4. When creating message groups from yaml, expand the wildcards before creating the group
  5. Make a breaking change and require explicit marking of things that are to be considered patterns

Option 4 might have issues with recursion, so I discard that immediately. Options 2-3 are kind of similar, but I think 2 is better because then the default option is safe. 1 and 2 are basically the same thing but shifting the responsibility. Both feel a bit dirty, but I think 1 is of a lesser evil. Option 5 requires most work and would be a breaking change.

I would like advice how to proceed here. My guess would be to apply a private patch on production to mute the warnings and wait for next MLEB release in October for the public release.

jcrespo updated the task description. (Show Details)Aug 21 2018, 7:24 AM
zeljkofilipin triaged this task as High priority.Aug 21 2018, 2:58 PM

This is creating a lot of noise in fatal monitor. It's the cause of the first two errors by volume, the third being two orders of magnitude smaller.

For the last 24 hours:

  • 106,955 [{exception_id}] {exception_url} ErrorException from line 100 of /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php: PHP Warning: Unknown modifier 'R': [/^page\-User\:BeneBot.+/RfD\-open/text$/]
  • 106,954 Warning: Unknown modifier 'R': [/^page\-User\:BeneBot.+/RfD\-open/text$/] in /srv/mediawiki/php-1.32.0-wmf.16/extensions/Translate/stringmangler/StringMatcher.php on line 100
  • 1,440 [{exception_id}] {exception_url} JobQueueError from line 828 of /srv/mediawiki/php-1.32.0-wmf.16/includes/jobqueue/JobQueueDB.php: Wikimedia\Rdbms\DBQueryError: A database query error has occurred. Did you forget to run your application's database schem

First issue is trivial to fix, just add '/' as second parameter to line 78 of StringMatcher.php.[...] In any case it is a good idea to patch this quickly.

Let's do that first then. I just deployed two security patches earlier today, so I'd be happy to do this one too.

I tried to check with shell.php whether e modifier is supported, but it seems it has been disabled.

Yeah, it's been removed in PHP7 and was deprecated before then.

Since everything else is quoted, users are not able to build arbitrary regexps.

Does that mean that we can untag this as security and make the task public? Or only after we put in the slash escaping? More generally, I don't quite understand which parts of this task are sensitive and which ones aren't.

The preg_quote without second parameter is potentially exploitable using the e modifier for third party PHP5 users (latest and some upcoming MLEB releases still supports PHP 5.5.9 because of MediaWiki 1.30 and 1.31). Only if someone confirms that is not exploitable, then this bug could be made public.

With the second parameter in place, people are (only) able to put one or multiple .* in the pattern whereever they want by having * character in the page names. I am not a regexp expert, but I would guess that it is not possible to cause significant DoS with that kind of patterns. The second part deals with not even allowing that, and treating * as a literal as it should be.

Can we ask Bene* to fix their bot to use a different page, at least for now?

The preg_quote without second parameter is potentially exploitable using the e modifier for third party PHP5 users (latest and some upcoming MLEB releases still supports PHP 5.5.9 because of MediaWiki 1.30 and 1.31). Only if someone confirms that is not exploitable, then this bug could be made public.

I don't think this is exploitable, because as far as I can tell (it's a little tricky to trace the flow of data through this code as a glance, so I might be wrong) the user-controlled regex only ends up being fed to preg_match(), not preg_replace(). The /e modifier is not supported for preg_match() and never has been. The PHP documentation claims that it's only used by preg_replace() and ignored by everything else, which would seem to mean that preg_replace_callback() ignores it too. The code in StringMatcher.php only calls preg_match() and preg_replace_callback(), it doesn't call preg_replace().

That said we should still escape things properly, of course.

With the second parameter in place, people are (only) able to put one or multiple .* in the pattern whereever they want by having * character in the page names. I am not a regexp expert, but I would guess that it is not possible to cause significant DoS with that kind of patterns. The second part deals with not even allowing that, and treating * as a literal as it should be.

Patterns with lots of occurrences of .* (or .+ which seems to be what's actually injected) can perform badly, but you'd need long inputs to really cause trouble. What kinds of things do these regexes run on?

I don't think this is exploitable, because as far as I can tell (it's a little tricky to trace the flow of data through this code as a glance, so I might be wrong) the user-controlled regex only ends up being fed to preg_match(), not preg_replace().

You are correct, I had the same thought while I was trying to get sleep :) I guess that means this can be made public.

Patterns with lots of occurrences of .* (or .+ which seems to be what's actually injected) can perform badly, but you'd need long inputs to really cause trouble. What kinds of things do these regexes run on?

Just on page titles of translatable pages.

Just on page titles of translatable pages.

OK, then I'm not too worried, since you need to be a translation admin to control those page titles.

I'll make this task public and submit a patch for the preg_quote issue, leaving it to you to write a patch for the more complex "page titles should not be interpreted as patterns" issue.

Catrope changed the visibility from "Custom Policy" to "Public (No Login Required)".Aug 29 2018, 12:21 AM

Change 456054 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/Translate@master] Always pass delimiter ('/') to preg_quote()

https://gerrit.wikimedia.org/r/456054

Change 456054 merged by jenkins-bot:
[mediawiki/extensions/Translate@master] Always pass delimiter ('/') to preg_quote()

https://gerrit.wikimedia.org/r/456054

Change 456296 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/Translate@wmf/1.32.0-wmf.19] Always pass delimiter ('/') to preg_quote()

https://gerrit.wikimedia.org/r/456296

Change 456297 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/extensions/Translate@wmf/1.32.0-wmf.18] Always pass delimiter ('/') to preg_quote()

https://gerrit.wikimedia.org/r/456297

Change 456297 merged by jenkins-bot:
[mediawiki/extensions/Translate@wmf/1.32.0-wmf.18] Always pass delimiter ('/') to preg_quote()

https://gerrit.wikimedia.org/r/456297

Change 456296 merged by jenkins-bot:
[mediawiki/extensions/Translate@wmf/1.32.0-wmf.19] Always pass delimiter ('/') to preg_quote()

https://gerrit.wikimedia.org/r/456296

Krinkle closed this task as Resolved.Sep 11 2018, 10:20 PM
Krinkle assigned this task to Catrope.
Krinkle added a subscriber: Krinkle.

Thanks!

mmodell changed the subtype of this task from "Task" to "Production Error".Aug 28 2019, 11:09 PM