Page MenuHomePhabricator

Wikidata should offer BEACON files to improve GLAM connections
Open, LowPublicFeature

Description

BEACON files are an easy-to-create and easy-to-use means for connecting websites. The format was invented by German Wikipedians some years ago. It is used by libraries and institutional websites, mainly in the German-speaking world.

A German introduction can be found here:
http://de.wikipedia.org/wiki/Wikipedia:BEACON

An English introduction can be found here:
http://meta.wikimedia.org/wiki/BEACON

For examples of the possibilities of the format, see:
http://www.bmlo.lmu.de/Q/PND=118539841
http://beacon.findbuch.de/seealso/pnd-aks?format=sources&id=118515055

Most existing BEACON files are based on GND authority data, but it's perfectly possible to use any other identifier.

As we try to convince the GLAM world to use this format, I think especially Wikidata with its wealth of authority data fields conncected to each other and to Wikimedia project websites should offer such BEACON files.

Those BEACON files could be updated by a bot on a regular basis (e. g. weekly), and be hosted on some official Wikimedia server.

Right now, the most important Wikidata BEACON file would probably be:

GND -> Wikidata ID

(including all Wikidata items that have a "GND identifier" field, and only those). For this one we don't even need a resolver tool, since the Wikidata URL format already works as a valid link resolver.

Others could be:
GND -> en.wikipedia
GND -> de.wikisource
GND -> Commons Category
VIAF -> en.wikipedia
VIAF -> Commons Category
SUDOC -> fr. wikipedia
etc. etc.

Also, BEACON files like
Wikidata ID -> wikimedia projects
or even
Wikidata ID -> some external identifier
could be useful.

I think retrieving the necessary data should be very very simple (it's just the simplest SQL query), and the resolver tool with its 1:1 functionality doesn't seem to be rocket science either. Something like https://tools.wmflabs.org/wdrdr/cgi-bin/index.cgi – with some tweaks like adding the project – might already do the trick actually.


Version: wmf-deployment
Severity: enhancement
Whiteboard: u=dev c=infrastructure p=0

Details

Reference
bz60366

Related Objects

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:57 AM
bzimport set Reference to bz60366.
bzimport added a subscriber: Unknown Object (MLST).

Maybe a first step would be to implement some resolving functionality in Wikidata. I think any property with "single value" and "unique value" constraints (especially those under the section "Authority Control") are candidates for that. I imagine that http://www.wikidata.org/resolver/P214/131280745 directly would open a way to Q1218.

[For some identifier systems, notably ISBN, there may not exist a universal agreed syntax and therefore for some properties a normalization stage could be desirable]

A (simple) BEACON file as proposed by Andreas above would now be a simple dump of all values of the given property with some header lines saying "Wikidata", a verbalization of the property name etc. and most important the pattern for the resolving URL, i.e.

#TARGET: http://www.wikidata.org/resolver/P214/{ID}

The scope of BEACON files is human readable representations, here this means some kind of HTML page, thus at the moment the regular item display should be the result of the resolution process.

Now another question is that of combining resolving with some kind of forwarding process: The example item with P214 also has a linked coat of arms P94 and thus opens a path from VIAF 131280745 to "Emblem of Jerusalem.svg" (unfortunately no URL is known or derivable without further knowledge) or with respect to P237 a description of the coat of arms is available (unfortunately its an wikidata item, not the description itself). But there are some properties like P646 ;-) which are transformable to external ressources and a BEACON file would list those P214-values pertaining to items which simultaneously carry P214 and P646.

Alas, the efficient way to code them would be by listing a concordance between P214 and P646 in the BEACON file (and formulate the #TARGET by means of the P646 number) but this is not specified yet. And of course this kind of BEACON file would be generated from Wikidata data but this origin wouldn't be at all visible in its applications thus implementation propably should be left to interested parties and not Wikidata itself.

(Illustration to comment #1)

[...] a
BEACON file would list those P214-values pertaining to items which
simultaneously carry P214 and P646.

Alas, the efficient way to code them would be by listing a concordance
between
P214 and P646 in the BEACON file (and formulate the #TARGET by means of the
P646 number) but this is not specified yet. And of course this kind of BEACON
file would be generated from Wikidata data but this origin wouldn't be at all
visible in its applications thus implementation propably should be left to
interested parties and not Wikidata itself.

http://beacon.findbuch.de/downloads/wikidata/ contains BEACON files basing on "GND Identifier" (P227) for several Wikidata Properties routinely extracted from the full dumps in two flavors: E.g. http://beacon.findbuch.de/downloads/wikidata/wikidata_gnd-mgp-konkbeacon.txt contains the concordance from P227 to P549 in an inofficial, convenience format and http://beacon.findbuch.de/downloads/wikidata/mgp-gndbeacon.txt is the even simpler official format - so simple that it cannot even show the P549 Identifiers or URLs and has to rely on an additional resolver http://beacon.findbuch.de/gnd-resolver/wd_mgp I had to set up for the purpose. A third form would be the following with complete URLs

...
100055427|http://genealogy.math.uni-bielefeld.de
/genealogy/id.php?id=65162
...

or

...
100055427|Heinrich Wilhelm Brandes|http://genealogy.math.uni-bielefeld.de
/genealogy/id.php?id=65162
...

This form *is* permitted by the spec, very verbose, whilst obfuscating the identifiers used by the target application, and is still somehow more complex than the one-column list form (and since this makes a difference for some tools, especcially http://toolserver.org/~apper/pd/stat/beacon.php does only support the simple form I usually don't publish this third variant)

Could you provide a use case of how our BEACON files would be consumed?

Well, actually Magnus Manske has set up most of the requested functionality at http://tools.wmflabs.org/wikidata-todo/beacon.php

If you select distinct WD properties for "Property" and "Source" it essentially generates a two column table mapping the values of the "Source" property to values of the "Property" property and the head field "#TARGET" provides an URL template to construct an actionable URL from values taken from the right hand column. Typical use cases are to have some "authority control" numbers in the left column (not necessarily web actionable) and identifiers for "interesting stuff on the web" in the right column.

Example: http://tools.wmflabs.org/wikidata-todo/beacon.php?prop=535&source=214 maps from VIAF (P214) "to" the Find-a-Grave site (P535).

In this example a third party (database like) site which happens to know the VIAF identifiers for its stuff may consummate the BEACON file provided by Wikidata and subsequently is able to provide its users with links to the Find A Grave website. In this scenario wikidata is merely a provider for mapping data.

Real example with GND identifiers (P227): http://www.deutsche-biographie.de/sfz118147.html is a biographical article from the "Neue Deutsche Biographie" and has a tab "Weitere Informationen" (further information): Any link there (with the exception of "GND" and "VIAF") is based on BEACON files they import into their web application.

IMHO Magnus' tool should suffice to exploit wikidata's parenthetic ability to provide mappings between two "external" identifier systems.

Now, if Wikidata (or its Q-Numbers) are one of the participants in a mapping, some enhancements could be favourable:

E.g. VIAF provides resolving services, i.e. it supports the crafting of URLs constructed from identifiers of its constituents: viaf.org/viaf/sourceID/DNB|119186187 gives access to the VIAF cluster of the person above based on its GND identifier 119186187 thus obliviating the need to first determine the correcponding VIAF identifier. User APPER on the German Wikipedia has set up a service to achieve the same for de.wikipedia.org, http://tools.wmflabs.org/persondata/redirect/gnd/de/{ID} . Consequently the BEACON file http://tools.wmflabs.org/persondata/beacon/dewiki.txt just mentions the GND identifiers known to the target, making it a single-column table, since the task of identifier mapping is shifted from the BEACON file to the target resolver. Consumers of this BEACON file simply insert the appropriate identifiers into the target URL pattern to achieve a mapping to the German Wikipedia: http://tools.wmflabs.org/persondata/redirect/gnd/de/119186187 and users of that link are redirected to https://de.wikipedia.org/wiki/Heinrich_Sproemberg .

Now Wikidata could provide for any identifier-like property (most importantly those categorized under "authority control" a similar kind of resolving functionality ( http://www.wikidata.org/resolver/P227/{ID} ) and a corresponding BEACON file would list the resolver URL pattern as #TARGET and otherwise list all *values* of P227 (only for these the resolver would yield a non-empty result). Wether the resolver should resolve to the "raw" wikidata item page or rather a prettified reasonator presentation would be open to settlement.

More interesting perhaps would be a variant where sitelinks are exploited:
http://www.wikidata.org/resolver/P227/en/{ID} would directly redirect to the corresponding article on en.wikipedia.org and the corresponding BEACON file would list the subset of values of P227 occurring in wikidata items having a sitelink to en.wikipedia. This would be a symmetric counterpart to the wikidata use inside en.wikipedia: The authority control template there traditionally only list the VIAF number but pulls in identifiers for a major number of different authority control systems from Wikidata.

For P227 and de.wikipedia (also de.wikisource and commons) this setup would provide the exact same functionality as APPER's on wmflabs, for LCCN, VIAF, ... and other authority control properties and/or sites other than de.wikipedia it would be something new allowing library catalogs utilizing VIAF or LCCN or SUDOC numbers to provide links to en.wikipedia, fr.wikipedia and so on in a much more lightweight fashion than to explicitly store the titles of the individual articles in their database.

Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Lydia_Pintscher removed a subscriber: Unknown Object (MLST).
Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:13 AM
Aklapper removed a subscriber: Tbayer.