Page MenuHomePhabricator

Collect all regular expressions used in Wikidata's Template:Constraint:Format
Closed, ResolvedPublic

Description

I would like to know which regex features our community uses to validate the format of string property values. I would like to have a list of all the 600+ regexes in http://www.wikidata.org/wiki/Template:Constraint:Format. A possible way to do this is via PyWikiBot (something I would love to dive into in a beer & cake event).

Here are good places to start:

Based on the outcome of this research we can check if:

  • We can create a validator that parses a given regex and fails if it contains features we do not want to allow.
  • We may want to restrict the feature set to the limited regex support in JavaScript, see http://www.ecma-international.org/ecma-262/5.1/#sec-15.10.
  • We may use Lua's regex engine instead of PCRE.

Event Timeline

thiemowmde raised the priority of this task from to Needs Triage.
thiemowmde updated the task description. (Show Details)

Great idea at all. Since we are already parsing all constraints with their parameters, it should be really for us to create this list. I will try to bring it to the office today.

Tamslo moved this task from WBQC Backlog to TODO on the Wikibase-Quality board.
Tamslo moved this task from TODO to DONE on the Wikibase-Quality board.
Bene subscribed.