As is, the CSV value can be used as a DoS vector, or in the worst case exploit stuff like http://www.openwall.com/lists/oss-security/2015/06/01/6. The regex either needs to be sanitized to a known good expression, or this check needs to be removed
|Resolved||Lydia_Pintscher||T99354 Review and deploy Wikibase-Quality-Constraints on wikidata.org|
|Resolved||csteipp||T99355 Security review of Wikibase-Quality-Constraints - v1 branch|
|Resolved||Jonaskeutel||T101467 Ex: WikibaseQualityConstraints - remove or sanitize regex for FormatChecker|
|Resolved||Jonas.keutel||T102892 Collect all regular expressions used in Wikidata's Template:Constraint:Format|
We read about this and understand it's problematic, but we still have no idea how to fix this issue. For concerns about the runtime we could add a timeout which would lead in the worstcase to a false negative, but about the exploit-stuff-case...
Do you have any advice for us? We were curious how online regex tools handle this problem...
I'm not sure what kinds of regexes are expected here, so can't give great guidance on the best solution. Theomowmde's solution of only allowing admins to add them will prevent mass exploitation, but would still allow admins to attack the server in the case of another pcre exploit. So I'd prefer to not rely on that.
How important is this feature?
Assuming it's really needed, you could probably do a couple of things,
- Only allow a subset of regex expressions-- if all you need is for people to say, "\w+" or "[0-9]*", then that should be possible
- Have a sandboxed service (shell out to a confined binary, or make it a web service) that does the regex processing
- Implement a descriptive language that always generates safe regexes