I would like to know which regex features our community uses to validate the format of string property values. I would like to have a list of all the 600+ regexes in http://www.wikidata.org/wiki/Template:Constraint:Format. A possible way to do this is via PyWikiBot (something I would love to dive into in a beer & cake event).
Here are good places to start:
- https://www.wikidata.org/wiki/Special:WhatLinksHere/Template:Constraint:Format
- https://www.wikidata.org/wiki/Special:Search/hastemplate:"Constraint:Format"%20insource:"pattern"?ns121=1
Based on the outcome of this research we can check if:
- We can create a validator that parses a given regex and fails if it contains features we do not want to allow.
- We may want to restrict the feature set to the limited regex support in JavaScript, see http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.
- We may use Lua's regex engine instead of PCRE.