Page MenuHomePhabricator

Add support for 'external identifier' data type (top-level and qualifiers)
Closed, ResolvedPublic

Description

Add support for the data type 'external identifier' for statements and qualifiers on commons

A external identifier data type is essentially a string with constraints - on wikidata it's entered as a string, but it's validated against the property's constraints (it seems - https://www.wikidata.org/wiki/Property:P269 under constraints)

Users need to be able to add an external id for any property that takes external identifiers as a value, in the File page and in UploadWizard

Examples

An external identifier can be an IMDB ID, an ISBN, or a Library of Congress authority ID or others (see full list of Wikidata properties that use the external identifier data type).

User stories

  • As a Commons editor, I want to add an external identifier to a media file’s metadata, so that GLAMs can connect this ID to other databases for archival purposes.
  • As a Commons editor, I want to add an external identifier to a media file’s metadata, so that other systems can find this Q item by looking up this ID.

Acceptance criteria:

  • can select a property with data type 'external identifier' and enter a value for it, on File page and UploadWizard
    • on File page
    • in UploadWizard
  • incorrect values return an error

Details

Related Gerrit Patches:
mediawiki/extensions/WikibaseMediaInfo : masterAdd support for properties with URL datatypes

Event Timeline

Cparle created this task.Sep 4 2019, 2:22 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 4 2019, 2:22 PM
Ramsey-WMF triaged this task as High priority.Sep 10 2019, 4:25 PM

Change 535975 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[mediawiki/extensions/WikibaseMediaInfo@master] Add support for properties with URL datatypes

https://gerrit.wikimedia.org/r/535975

Change 535975 abandoned by Eric Gardner:
Add support for properties with URL datatypes

Reason:
Better to handle this systematically than doing each data type piecemeal; abandoned in favor of this patch: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/ /537208

https://gerrit.wikimedia.org/r/535975

External Identifier property datatypes correspond to string value types, so (basic) support for these types is enabled via: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikibaseMediaInfo/+/542253. Some UI work will be necessary to improve the experience.

PDrouin-WMF updated the task description. (Show Details)Nov 14 2019, 10:47 PM
PDrouin-WMF updated the task description. (Show Details)Nov 14 2019, 10:49 PM
PDrouin-WMF updated the task description. (Show Details)Nov 14 2019, 10:54 PM

Here's a question about this datatype RE: acceptance criteria.

Wikidata allows Properties to specify constraints (the WikibaseQualityConstraints extension seems to be what powers this). These constraints can control what values are allowed as well as how values get displayed using regular expressions. I'm assuming that we should honor these constraints when they exist, as part of the acceptance criteria for this feature.

Here's a concrete example of this feature on Wikidata. The property Instagram username has the "external identifier" data type. It has some additional rules in place so that when users type in a string username, the displayed value becomes a link to the appropriate instagram page using the "formatter URL" rule: https://www.instagram.com/$1/

So a user inputs this:

But sees this once they publish:


(this is a working link to Obama's IG page).

I assume that if Instagram disallowed certain characters or patterns, Wikidata could also validate and inform the user.

Fortunately for us, it looks like we can get Wikidata to do most of this work for us by using the wbformatvalue API. The key is that instead of providing a datatype for formatting, you can provide the specific property that a given value will be formatted against. The API params look like this:

{
	"action": "wbformatvalue",
	"format": "json",
	"datavalue": "{\"value\":\"barackobama\",\"type\":\"string\"}",
	"property": "P2003"
}

Here's a link to the API sandbox for further exploration.

So, long story short, it looks like we can rely on existing APIs to do most of this. The question is, how do we determine when to format against a specific property rather than against a datatype? Can we use a specific property in all cases (assuming the default rules for datatypes will be used if a property has no constraints of its own)? And it's nice we can rely on Wikidata in production, but for local development and testing it looks like we're going to need to set up some of these rules locally – something I don't really have any idea how to do yet.

@PDrouin-WMF here's what I was thinking for an updated input UI here (this would actually apply to all "string" types: URL, external-identifier, plain string, maybe monolingual-text).

I've introduced one additional element, a button that says "add" (exact language TBD). Users can either hit enter or use this button to insert the text in the field as a new item below. Server-side validation and re-formatting would take place at this time, based on the rules of the property in question. So a user may see error text above the input field if they hit enter/"add" with some invalid data.

Initial state:

Edit state:

Let me know what you think.

Change 551299 had a related patch set uploaded (by Eric Gardner; owner: Eric Gardner):
[mediawiki/extensions/WikibaseMediaInfo@master] Improved support for string input types

https://gerrit.wikimedia.org/r/551299

@egardner I feel you're mixing two things. An external id is an identifier that gets expanded to an URL (and sometimes an URI too). For example on https://www.wikidata.org/wiki/Property:P269 we have the formatter url at https://www.wikidata.org/wiki/Property:P269#P1630 . The formatter url takes care of the expansion.

egardner removed egardner as the assignee of this task.Dec 1 2019, 10:43 PM
egardner moved this task from Doing to To Do on the Structured-Data-Backlog (Current Work) board.
AnneT claimed this task.Dec 2 2019, 5:50 PM
Ramsey-WMF added a subscriber: AnneT.
Ramsey-WMF closed this task as Resolved.Dec 16 2019, 10:23 PM

Tested with multiple properties (Twitter Username, WTA Player ID, BBC News Live Democracy, etc). All worked.