See the parent task for a more general problem statement – in short, we need to check user-provided input against user-provided regexes, which is generally unsafe.
However, the majority of user-provided regexes we’re interested in (76%) do not contain any parentheses, which means they cannot contain any groups and their star height must be 0 or 1. Unless I’m mistaken, this should mean we can safely evaluate them via preg_match, and the evaluation time should not explode.
Since we’re currently increasing the number of constraint checks being run in general (T204031), which results in an increased query volume (T204031#4898189), I think we should consider checking such regexes in PHP, removing the overhead of talking to the query server.
Are there any objections to this? Does anyone have an example query that causes exponential runtime in PCRE without containing parentheses, or are there other security problems when checking user-provided regexes besides runtime concerns?