WDQS's REGEX() function is not subject to timeout
Closed, ResolvedPublic
Actions

Description

The REGEX() function in the Wikidata Query Service is not subject to the timeout of 60 s, allowing an attacker to take up much more CPU time on the service than should be allowed.

One example of a pathological regular expression is (x+x+)+y (from CodingHorror), which takes exponential time (twice as long per extra input character) to verify that xxxxx… does not match the regular expression.

The following query returned false after 90 s:

SELECT (REGEX("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "(x+x+)+y") AS ?b) {}

The following query resulted in a 504 Gateway Time-out error after an unknown duration (I didn’t check, probably between 3 and 15 minutes):

SELECT (REGEX("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "(x+x+)+y") AS ?b) {}

BlazeGraph implements the REGEX() function directly with java.util.regex regexes, which are implemented in C and not interruptible (which is why the timeout has no effect). There is a workaround (custom CharSequence implementation), but the real fix should be for BlazeGraph to implement regular expressions properly – SPARQL’s regular expression language is that of XQuery 1.0 and XPath 2.0, which AFAIU is not identical to Java regexes anyways (for instance, I’m pretty sure it shouldn’t support Java’s named capturing group syntax, i. e. SELECT (REGEX("aa", "(?<name>a)\\k<name>") AS ?b) WHERE {} should result in an error).

Details

	Subject	Repo	Branch	Lines +/-
	[WIP] Make regex() interruptible	wikidata/query/rdf	master	+460 -5

Customize query in gerrit

Related Objects

Mentioned In: T236884: [Recurring task] Upstream changes to Blazegraph that we have in our own fork
Mentioned Here: T168965: Why don't timeouts work during long regular expression matching?

Event Timeline

Lucas_Werkmeister_WMDE created this task.Jul 6 2017, 10:13 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJul 6 2017, 10:13 AM

Lucas_Werkmeister_WMDE added projects: Wikidata, Wikidata-Query-Service.Jul 6 2017, 10:14 AM

Lucas_Werkmeister_WMDE added a subscriber: Smalyshev.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptJul 6 2017, 10:14 AM

Lucas_Werkmeister_WMDE added a subscriber: • Jonas.Jul 6 2017, 1:17 PM

We may consider using different regex engine, such as:

I’m pretty sure it shouldn’t support Java’s named capturing group syntax

While this support is not requires, I don't think it hurts anything, so any engine that supports basic capabilities should be fine.

TCL/HSRE doesn’t seem to support the \p{Lu} syntax (for Unicode categories).

I also disagree that support for nonstandard features is okay – the SPARQL spec says that REGEX is exactly XPath fn:matches (and REPLACE is exactly XPath fn:replace). And according to the XPath spec, those functions should raise errors if the pattern is invalid.

More practically speaking, supporting nonstandard features means that users will have no indication that they’re using a nonstandard feature, and that we’ll break queries whenever we switch regex engines.

Legoktm renamed this task from REGEX() function is not subject to timeout to WDQS's REGEX() function is not subject to timeout.Jul 7 2017, 9:20 AM

Lucas_Werkmeister_WMDE added a subscriber: thiemowmde.Jul 7 2017, 10:14 AM

the SPARQL spec says that REGEX is exactly XPath fn:matches (and REPLACE is exactly XPath fn:replace). And according to the XPath spec, those functions should raise errors if the pattern is invalid.

Frankly, I see little use in this requirement. Who ever needs their query to fail?

More practically speaking, supporting nonstandard features means that users will have no indication that they’re using a nonstandard feature, and that we’ll break queries whenever we switch regex engines.

We already support many features not present in standard SPARQL. Yes, they won't work if we ever switch engines. But that's not something we do frequently, only when really necessary, and with proper announcements, etc. If we find RE engine that matches XPath requirement exactly to the point, fine. But if not, that should not be a deal-breaker by itself.

Smalyshev added a subscriber: Gehel.Jul 10 2017, 8:57 PM

Smalyshev moved this task from Incoming to Current work on the Wikidata-Query-Service board.Jul 13 2017, 12:55 AM

Bawolff moved this task from Backlog / Other to Other WMF team on the acl*security board.Jul 17 2017, 5:03 AM

If I might add my two cents:

Sure, having any regex engine is better than nothing. But a regex engine that supports so much more than expected from the SPAQRL standard is a problem, and should not be denied so easily. Users will not read what the standard says, but just play around with the service and try what works and what not. They will write all kinds of advanced regular expressions that make use of the undocumented features and start relying on them. This is when it becomes a problem. But this is not what this ticket is about.
This ticket is about this feature possibly being used as a DoS vector, either intentionally or unintentionally when more users use more complicated queries that contain regular expressions. Is this an actual problem? Do we need to do something about this in advance? I can't tell.

thiemowmde moved this task from incoming to monitoring on the Wikidata board.Jul 19 2017, 4:29 PM

Relevant ORES task to look at and compare: T168965: Why don't timeouts work during long regular expression matching?

It looks like there's actually two issues here:

regex is not interrupted by regular query timeout handling in Blazegraph
Constant expressions seem to be handled by main service thread and not query executor thread, and the former is not subject to query executor timeouts.

(1) can be fixed/mitigated by InterruptibleCharacterSequence hack linked above. Still looking into the fix for (2), it may be trickier.

Smalyshev claimed this task.Jul 19 2017, 7:23 PM

Smalyshev added a project: Discovery-Wikidata-Query-Service-Sprint.

Smalyshev added a subscriber: • EBjune.Jul 20 2017, 11:31 PM

Smalyshev removed a subscriber: • EBjune.Jul 20 2017, 11:34 PM

Smalyshev added a project: User-Smalyshev.Aug 4 2017, 9:45 PM

Smalyshev moved this task from Backlog to Waiting/Blocked on the User-Smalyshev board.

Smalyshev moved this task from Backlog to In progress on the Discovery-Wikidata-Query-Service-Sprint board.Aug 7 2017, 9:20 PM

Smalyshev moved this task from Waiting/Blocked to Doing on the User-Smalyshev board.

Smalyshev moved this task from Doing to Waiting/Blocked on the User-Smalyshev board.Aug 8 2017, 11:54 PM

Smalyshev moved this task from Waiting/Blocked to Doing on the User-Smalyshev board.Aug 9 2017, 11:28 PM

Smalyshev moved this task from Doing to Waiting/Blocked on the User-Smalyshev board.Aug 30 2017, 12:06 AM

Smalyshev moved this task from In progress to Needs review on the Discovery-Wikidata-Query-Service-Sprint board.Aug 30 2017, 10:19 PM

This is subject to regular timeouts now. Blazegraph issue is still not fully resolved, but I made some quick patches for the meantime.

Is the fix for this issue public somewhere? I can’t find any commits in the wikidata/query/rdf repository that look related.

And by the way, do we still need a custom visibility policy for this task or can it be public now?

• chasemp added a project: Security.Feb 10 2020, 10:55 PM

• chasemp removed a project: acl*security.Feb 20 2020, 8:15 PM

Lucas_Werkmeister_WMDE mentioned this in T236884: [Recurring task] Upstream changes to Blazegraph that we have in our own fork.Mar 30 2020, 3:37 PM

Lucas_Werkmeister_WMDE added a subscriber: dcausse.Mar 30 2020, 6:35 PM

I think we can make this task public, the workaround has been shipped to blazegraph 2.1.5 via https://github.com/blazegraph/database/commit/d13f320ffefc90d668a3411c466ec3c1adf00e10

And (sorry forgot the fix for the 2nd issue): https://github.com/blazegraph/database/commit/3ddea636ee6e2b6ea0c3fdcae41b36d8ff93e3e6 which is also shipped with blazegraph 2.1.5

I can’t figure out how to make this task public. @chasemp can you please do it?

In T169862#6013736, @dcausse wrote:

I think we can make this task public, the workaround has been shipped to blazegraph 2.1.5 via https://github.com/blazegraph/database/commit/d13f320ffefc90d668a3411c466ec3c1adf00e10

In T169862#6466230, @Lucas_Werkmeister_WMDE wrote:

I can’t figure out how to make this task public. @chasemp can you please do it?

Macro challenge-accepted:

• chasemp changed the visibility from "Custom Policy" to "Public (No Login Required)".Sep 16 2020, 1:45 PM

Thank you :)

WDQS's REGEX() function is not subject to timeoutClosed, ResolvedPublicActions

Description

Details

Related Objects

Event Timeline

WDQS's REGEX() function is not subject to timeout
Closed, ResolvedPublic
Actions