Page MenuHomePhabricator

Small feature request: recognize bracketed IDs as valid IDs
Closed, ResolvedPublic


When a property has 'item' datatype, this can be entered either by directly typing the item ID, eg Q123, or by typing the item label or alias, eg September.

Pasting in the Q-number from somewhere else is a pretty common practice when entering values, especially if the item has a very common name ("John Smith") or doesn't have a label in the currently used language.

Unfortunately, most uses of item IDs on Wikidata show the ID in brackets - September (Q123). This means that copying-and-pasting often picks up a surplus pair of brackets that have to be removed. Would it be possible for the system here to identify (id) as a valid ID and accept it? A number of external tools (eg Magnus's mix-and-match) can handle bracketed IDs and it would be good if Wikidata proper could as well.

Event Timeline

agray raised the priority of this task from to Needs Triage.
agray updated the task description. (Show Details)
agray added a project: Wikidata.
agray added a subscriber: agray.

It would be handy if an item's URL - e.g. - could also be parsed.

thiemowmde triaged this task as Medium priority.Aug 26 2016, 2:42 PM

The relevant code is in EntitySearchHelper.php, line 122, getExactMatchForEntityId. The most easiest solution I can think of is a second $this->idParser->parse( … ) try, when the first failed, with a bit of trivial normalization applied. I would do it like this:

if ( preg_match( '/.*(\b\w+)/s', $term ) {
    $lastWord = $matches[1];

What this does is searching for the last word (assuming all entity IDs are sequences of ASCII word characters) in the input string. This will find the ID number in (Q42),, and many other cases. It's compatible with all entity types we have.

It will not find the number in Q42 (Douglas Adams), which is not different from the status quo. It may behave a bit inconsistent because it will find the number when it's not followed by ASCII word characters, e.g. Q11738 (Ö).

The "worst" case I can think of is a weird input string like Q41 and Q42. It will only suggest the last ID, which is not a big deal in my opinion.

WARNING: Make sure this plays nicely with PropertySuggester. I had a quick look and believe it does, but this needs testing.

Change 308181 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Allow to search for entity IDs in URLs, brackets and similar

Change 308181 merged by jenkins-bot:
Allow to search for entity IDs in URLs, brackets and similar