It is suggested to create a RegularExpression (or Pattern, or similar) Property type to better edit, monitor, manage, check and process patterns represented by regular expressions. Some technical issues not addressed by the current study that are related to the lack of ability to manage regular expressions are described on phab:T176312, phab:T214378, phab:T236150 and phab:T240884.
|Open||None||T244043 suggestions and possible decisions from the 2020 report on Property constraints|
|Open||None||T91505 [Epic] Adding new datatypes to Wikidata (tracking)|
|Open||None||T244046 Consider creating a Wikibase datatype for patterns (regex)|
- Mentioned Here
- T176312: Don’t check format constraint via SPARQL (safely evaluating user-provided regular expressions)
T214378: Check simple format constraints (no grouping) in PHP instead of SPARQL
T236150: Write an RFC describing in detail possible solutions for checking user-provided regexes in constraints
T240884: RFC: How to evaluate user-provided regular expressions
T244043: suggestions and possible decisions from the 2020 report on Property constraints
How would any of these be affected by a separate data type for regular expressions / patterns? The problem is the same: we still need to evaluate user-specified (untrusted) regular expressions safely; I don’t see what a separate data type would change here.
A specific data/property type for regular expressions or patterns would:
- ensure that the stored regular expressions or patterns are syntactically correct, in a similar way that quantity-type properties ensure that their statements are not paragraphs, something from which all implementations, tools and reusers would benefit;
- introduce a specific input (interface) control in Wikibase (not WikibaseQualityConstraints) that could make pattern editing more friendly: monospaced text, warnings, colored brackets, and all the amazing features that the developers want to implement. :-)
This does not solve the low-level security issues of each implementation (e.g., avoid running the regular expression "'; DROP TABLE important_things;").
Would it also allow for reusing a regex?
See https://w.wiki/RK2 for stats regarding multiple properties using the same regex:
[1-9]\d* is used 615 times
\d+ is used 542 times
[1-9][0-9]* is used 89 times
Also, having items for the regex patterns would mean that they would be better monitored by users and optimized ([1-9][0-9]* is the same as [1-9]\d*)
@DannyS712: I think I don't understand your question. What do you mean by reusing a regex? What would you expect could be done on the UI that is not possible now? On https://www.wikidata.org/wiki/Wikidata:2020_report_on_Property_constraints#format there are also some general stats about our regular expressions.