Page MenuHomePhabricator

[SW] [WBQC] shellbox-constraints returning 500 on preg_match error
Open, Needs TriagePublic

Description

mediawiki service mesh is reporting a constant stream of shellbox-constraints requests that fail due to exceeding retry limit:

https://grafana.wikimedia.org/d/b1jttnFMz/envoy-telemetry-k8s?orgId=1&var-datasource=thanos&var-site=eqiad&var-prometheus=k8s&var-app=mediawiki&var-kubernetes_namespace=All&var-destination=shellbox-constraints&from=now-1d&to=now&viewPanel=10

There is a constant stream of 500 errors reported by shellbox-constraints apache as well. I've captured 8 responses, all of them report errors of preg_match, some examples:

{
    "action": "call",
    "functionName": "preg_match",
    "sources": [],
    "binary": false,
    "params": [
        "/^(?:(people|institutions|sources)\\\\/[1-9]\\d*)$/u",
        "people/509522"
    ]
}
{
    "__": "Shellbox server error",
    "class": "Shellbox\\ShellboxError",
    "message": "PHP error in /srv/app/src/Action/CallAction.php line 30: preg_match(): Unknown modifier '['",
    "log": [
        {
            "level": 400,
            "message": "Exception of class Shellbox\\ShellboxError: PHP error in /srv/app/src/Action/CallAction.php line 30: preg_match(): Unknown modifier '['",
            "context": {
                "trace": "#0 [internal function]: Shellbox\\Server->handleError(2, 'preg_match(): U...', '/srv/app/src/Ac...', 30, Array)\n#1 [internal function]: preg_match('/^(?:(people|in...', 'people/509522')\n#2 /srv/app/src/Action/CallAction.php(30): call_user_func_array('preg_match', Array)\n#3 /srv/app/src/Action/MultipartAction.php(68): Shellbox\\Action\\CallAction->execute(Array)\n#4 /srv/app/src/Server.php(131): Shellbox\\Action\\MultipartAction->baseExecute(Array)\n#5 /srv/app/src/Server.php(72): Shellbox\\Server->guardedExecute('/srv/app/config...')\n#6 /srv/app/src/Server.php(61): Shellbox\\Server->execute('/srv/app/config...')\n#7 /srv/app/index.php(3): Shellbox\\Server::main('/srv/app/config...')\n#8 {main}"
            }
        }
    ]
}
{
    "action": "call",
    "functionName": "preg_match",
    "sources": [],
    "binary": false,
    "params": [
        "/^(?:[0-9A-Za-z\\-\\\\/\\(\\) ]*)$/u",
        "1990 SH2"
    ]
}
{
    "__": "Shellbox server error",
    "class": "Shellbox\\ShellboxError",
    "message": "PHP error in /srv/app/src/Action/CallAction.php line 30: preg_match(): Unknown modifier '\\'",
    "log": [
        {
            "level": 400,
            "message": "Exception of class Shellbox\\ShellboxError: PHP error in /srv/app/src/Action/CallAction.php line 30: preg_match(): Unknown modifier '\\'",
            "context": {
                "trace": "#0 [internal function]: Shellbox\\Server->handleError(2, 'preg_match(): U...', '/srv/app/src/Ac...', 30, Array)\n#1 [internal function]: preg_match('/^(?:[0-9A-Za-z...', '1990 SH2')\n#2 /srv/app/src/Action/CallAction.php(30): call_user_func_array('preg_match', Array)\n#3 /srv/app/src/Action/MultipartAction.php(68): Shellbox\\Action\\CallAction->execute(Array)\n#4 /srv/app/src/Server.php(131): Shellbox\\Action\\MultipartAction->baseExecute(Array)\n#5 /srv/app/src/Server.php(72): Shellbox\\Server->guardedExecute('/srv/app/config...')\n#6 /srv/app/src/Server.php(61): Shellbox\\Server->execute('/srv/app/config...')\n#7 /srv/app/index.php(3): Shellbox\\Server::main('/srv/app/config...')\n#8 {main}"
            }
        }
    ]
}

Event Timeline

This comment was removed by JMeybohm.
JMeybohm renamed this task from shellbox-constraints listener is constantly exceeding the retry limit to shellbox-constraints returning 500 on preg_match error.Apr 8 2024, 4:03 PM
JMeybohm updated the task description. (Show Details)
Reedy subscribed.

Pretty sure these examples are from Wikibase-Quality-Constraints:

	private function runRegexCheckUsingShellbox( string $text, string $format ): string {
		if ( !$this->shellboxClientFactory->isEnabled( 'constraint-regex-checker' ) ) {
			return CheckResult::STATUS_TODO;
		}
		try {
			$pattern = '/^(?:' . str_replace( '/', '\/', $format ) . ')$/u';
			$shellboxResponse = $this->shellboxClientFactory->getClient( [
				'timeout' => $this->config->get( 'WBQualityConstraintsSparqlMaxMillis' ) / 1000,
				'service' => 'constraint-regex-checker',
			] )->call(
				'constraint-regex-checker',
				'preg_match',
				[ $pattern, $text ]
			);
		} catch ( ShellboxError $exception ) {
			throw new ConstraintParameterException(
				( new ViolationMessage( 'wbqc-violation-message-parameter-regex' ) )
					->withInlineCode( $pattern, Role::CONSTRAINT_PARAMETER_VALUE )
			);
		}

		if ( $shellboxResponse ) {
			return CheckResult::STATUS_COMPLIANCE;
		} else {
			return CheckResult::STATUS_VIOLATION;
		}
	}

Which probably suggests it's regexes on wikidata items editable on the wiki

Last edit to any of this code was rEBQCd7564ee12be3: Fix regex checking using preg_match() in August 2021...

So depending on how long this has been going on, especially if it's recent, is probably due to wikidata edit(s)

As there's many parts to this...

Seemingly you can do "regex validation" in PHP by testing against null/'', and see if preg_match returns false.

I wonder if WDQC should be doing that before shelling out to shellbox, and display a nice user error somewhere along the line. Rather than using shellbox (or sparql?) to do regex validation...

CheckResult::STATUS_BAD_PARAMETERS looks like it may work "The constraint parameters are broken."; as could a more specific error.

Ref: https://stackoverflow.com/questions/4440626/how-can-i-validate-regex

https://www.wikidata.org/wiki/Property:P5504 is "/^(?:(people|institutions|sources)\\\\/[1-9]\\d*)$/u" but it hasn't been changed recently - https://www.wikidata.org/w/index.php?title=Property:P5504&action=history

It starts life on the wikipage as (people|institutions|sources)\/[1-9]\d*

The other is the text on https://www.wikidata.org/wiki/Q6739335, leading to https://www.wikidata.org/wiki/Property:P490, which has a regex on the wikipage as [0-9A-Za-z\-\/\(\) ]* which becomes "/^(?:[0-9A-Za-z\\-\\\\/\\(\\) ]*)$/u"

https://www.wikidata.org/w/index.php?title=Property:P490&action=history

Similarly not recently changed...

Change #1017926 had a related patch set uploaded (by Reedy; author: Reedy):

[mediawiki/extensions/WikibaseQualityConstraints@master] FormatCheckerTest: Add regexes and values that fail for T362084

https://gerrit.wikimedia.org/r/1017926

^ Sticking those two into a test... Lets see what happens

The tests fail as we'd expect/hope:

20:14:29 There were 2 errors:
20:14:29 
20:14:29 1) WikibaseQuality\ConstraintReport\Tests\Checker\FormatChecker\FormatCheckerTest::testT362084 with data set #0 ('(people|institutions|sources)...-9]\d*', 'people/509522')
20:14:29 preg_match(): Unknown modifier '['
20:14:29 
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/tests/phpunit/Checker/FormatChecker/FormatCheckerTest.php:396
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php:199
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php:181
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php:139
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/tests/phpunit/Checker/FormatChecker/FormatCheckerTest.php:117
20:14:29 phpvfscomposer:///workspace/src/vendor/phpunit/phpunit/phpunit:106
Logs generated by test

Show Details
20:14:29 
20:14:29 2) WikibaseQuality\ConstraintReport\Tests\Checker\FormatChecker\FormatCheckerTest::testT362084 with data set #1 ('[0-9A-Za-z\-\/\(\) ]*', '1990 SH2')
20:14:29 preg_match(): Unknown modifier '\'
20:14:29 
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/tests/phpunit/Checker/FormatChecker/FormatCheckerTest.php:396
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php:199
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php:181
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/src/ConstraintCheck/Checker/FormatChecker.php:139
20:14:29 /workspace/src/extensions/WikibaseQualityConstraints/tests/phpunit/Checker/FormatChecker/FormatCheckerTest.php:117
20:14:29 phpvfscomposer:///workspace/src/vendor/phpunit/phpunit/phpunit:106
Logs generated by test

Show Details
20:14:29 
20:14:29 ERRORS!
20:14:29 Tests: 24625, Assertions: 73774, Errors: 2, Skipped: 133.

The regex escaping is done in:

$pattern = '/^(?:' . str_replace( '/', '\/', $format ) . ')$/u';

Input: (people|institutions|sources)\/[1-9]\d*
Output: /^(?:(people|institutions|sources)\\/[1-9]\d*)$/u
Expected Output: /^(?:(people|institutions|sources)\/[1-9]\d*)$/

Input: [0-9A-Za-z\-\/\(\) ]*
Output: /^(?:[0-9A-Za-z\-\\/\(\) ]*)$/u
Expected Output: /^(?:[0-9A-Za-z\-\/\(\) ]*)$/u

The string replace is replacing (escaping) whether the / is already escaped or not...

I don't know whether the values from Wikidata should be escaped or not... https://www.wikidata.org/wiki/Q21502404 and https://www.wikidata.org/wiki/Property:P1793 helpfully don't mention, beyond:

When using on property constraints, ensure syntax is a PCRE

Either that should be enforced in software, or when we do the str_replace(), only do it if we need to... Or remove any escaping that is there to begin with, and then undo... But this feels similarly error prone.

Still. Better error handling is really needed anyway..

Can someone clarify what the problem here is? From WBQC’s perspective, it’s totally expected that some of these regex checks will fail (though there’s some confusion about which shellbox errors we should or shouldn’t try to catch, see T304084 and especially T304084#8561863). But we might need to make some changes to keep the service mesh monitoring happy? (“exceeding retry limit” also sounds concerning – we don’t really want these requests to be retried, I think.)

I don't know whether the values from Wikidata should be escaped or not... https://www.wikidata.org/wiki/Q21502404 and https://www.wikidata.org/wiki/Property:P1793 helpfully don't mention, beyond:

When using on property constraints, ensure syntax is a PCRE

There’s some more documentation in Help:Property constraints portal/Format, which also mentions that the pattern isn’t even interpreted as PCRE everywhere. I would also say that “PCRE” implies that forward slashes shouldn’t be backslash-escaped (since /format/flags is, as far as I’m aware, a PHP convention that’s not found in PCRE / PCRE2 itself), but I’ve tried to clarify that on the documentation page now.

Seemingly you can do "regex validation" in PHP by testing against null/'', and see if preg_match returns false.

That’s an interesting idea… do you know if that’s safe to do even with possibly malicious patterns? (Which are the reason that we check constraints in shellbox in the first place.)

Seemingly you can do "regex validation" in PHP by testing against null/'', and see if preg_match returns false.

That’s an interesting idea… do you know if that’s safe to do even with possibly malicious patterns? (Which are the reason that we check constraints in shellbox in the first place.)

Without digging into it... I'm not sure.

In theory, it should be "just" compiling/validating the regex, and then it woudn't surprise me if for say null, it then short circuits and doesn't actually execute said regex if it is valid (presumably it fails before running it, if the regex is invalid). An empty string feels different, as you could be using a regex to check for some string length (sure, there's better ways...).

If that works, that'd be pretty useful. I obviously don't think we should be trying to invent our own regex validator/checker (which could be an alternative solution)!

Are these "invalid regexes"/preg_match errors surfaced to users/editors at all? Or do they end up being silent failures behind the scenes?

Can someone clarify what the problem here is? From WBQC’s perspective, it’s totally expected that some of these regex checks will fail (though there’s some confusion about which shellbox errors we should or shouldn’t try to catch, see T304084 and especially T304084#8561863). But we might need to make some changes to keep the service mesh monitoring happy? (“exceeding retry limit” also sounds concerning – we don’t really want these requests to be retried, I think.)

The reason I've opened the task is that the stream of errors/retries it looked fishy and I wanted to make sure we have an understanding of what is going on. From my naive POV I would argue that a preg_match error should not be classified as a an internal server error, as it results from (invalid) user input. If this is the desired response in that case, we should probably, as you said, configure the service-mesh to not retry requests on 500 errors - as that does not make any sense then.

In theory, it should be "just" compiling/validating the regex, and then it woudn't surprise me if for say null, it then short circuits and doesn't actually execute said regex if it is valid (presumably it fails before running it, if the regex is invalid). An empty string feels different, as you could be using a regex to check for some string length (sure, there's better ways...).

I don’t see any special code for null in php_pcre.c, and in fact I think it just gets cast to the empty string under loose typing:

ERROR preg_match(): Passing null to parameter #2 ($subject) of type string is deprecated on line 2.

I also don’t see any special handling for the empty string before it gets passed to PCRE… but this early return looks interesting:

	if (start_offset2 > subject_len) {
		pcre_handle_exec_error(PCRE2_ERROR_BADOFFSET);
		RETURN_FALSE;
	}

If we set the offset parameter to be longer than the empty string, we’ll get false back before it tries to match the regex – but after compiling it.

Are these "invalid regexes"/preg_match errors surfaced to users/editors at all? Or do they end up being silent failures behind the scenes?

In principle, they should be shown to users; however, due to the error confusion seen in T304084, I’m not sure if it actually gets caught properly – it might also crash the whole constraint check request.

The reason I've opened the task is that the stream of errors/retries it looked fishy and I wanted to make sure we have an understanding of what is going on. From my naive POV I would argue that a preg_match error should not be classified as a an internal server error, as it results from (invalid) user input. If this is the desired response in that case, we should probably, as you said, configure the service-mesh to not retry requests on 500 errors - as that does not make any sense then.

Maybe we could also change the way we make the call in WBQC – currently we directly pass preg_match as the function to call over RPC, but I think we could also create a wrapper function that catches these errors a bit better (RpcClient::call() apparently supports sending additional sources files to the remote side).

Lucas_Werkmeister_WMDE renamed this task from shellbox-constraints returning 500 on preg_match error to [SW] [WBQC] shellbox-constraints returning 500 on preg_match error.Apr 11 2024, 12:05 PM

Prio Notes:

Impact AreaAffected
production / end users½
monitoring
development efforts
onboarding efforts
additional stakeholders

The main impact seems to be logspam.

Side note: I added an invalid RISM ID statement to the sandbox item, and a constraint check for it returns:

{
  "status": "bad-parameters",
  "property": "P5504",
  "constraint": {
    "id": "P5504$8E96BAD6-6F32-445B-9BCC-D8C1FD4E6F64",
    "type": "Q21502404",
    "typeLabel": "format constraint",
    "link": "//www.wikidata.org/wiki/Property:P5504#P5504$8E96BAD6-6F32-445B-9BCC-D8C1FD4E6F64",
    "discussLink": "//www.wikidata.org/wiki/Property_talk:P5504"
  },
  "message-html": "<span class=\"wbqc-role wbqc-role-constraint-parameter-value\"><code>/^(?:(people|institutions|sources)\\\\/[1-9]\\d*)$/u</code></span> is not a valid regular expression.",
  "claim": "Q4115189$3f27f4ba-41a9-b216-abea-6103488bd41a"
}

So apparently we have some detection of invalid regexes already (I did not remember that). Will be worth looking into.