Page MenuHomePhabricator

SWEPUB Sparql Wikidata
Open, Needs TriagePublic

Assigned To
None
Authored By
Salgo60
Feb 22 2019, 1:20 PM
Referenced Files
F31757676: image.png
Apr 15 2020, 2:09 PM
F28285545: image.png
Feb 26 2019, 3:26 AM
F28280912: image.png
Feb 25 2019, 11:01 AM
F28280853: image.png
Feb 25 2019, 10:54 AM
F28270946: image.png
Feb 24 2019, 10:10 AM
F28274354: image.png
Feb 24 2019, 10:10 AM
F28274369: image.png
Feb 24 2019, 10:10 AM
F28274454: image.png
Feb 24 2019, 10:10 AM

Description

Task: To see if we have any added value by connecting SWEPUB and Wikidata using federation et al
Status: Stalled waiting on SWEPUB to decide license of metadata?!?!?

Background: I was at the SWEPUB userday (link) and thought we could better connect. This task is a first try to find out how. Maybe we can use SPARQL endpoints (looks like SWEPUB wants a username password?!?!)

Information needed to add SWEPUB as an endpoint in Wikidata

Compare T200668: Set up Nobel Data as federated search with Wikidata

From Wikidata:SPARQL_federation_input we need the following info

TODO: Check with SWEPUB

SPARQL Endpointhttp://virhp07.libris.kb.se/sparql/ - query UI select avancerad
Documentation???? http://www.kb.se/libris/SwePub/Format-och-praxis/
Licenseterms-of-use CC-0 ???
BackgroundSWEPUB info is a good match with Scholia as SWEPUB has articles, conference papers, theses etc. published at Swedish higher education institutions and authorities and they also have an SPARQL endpoint

Links

Modell

image.png (714×1 px, 583 KB)

image.png (714×1 px, 522 KB)

image.png (512×1 px, 107 KB)

image.png (524×397 px, 30 KB)

cc. @Fnielsen any thoughts?!?!?

Event Timeline

Salgo60 updated the task description. (Show Details)
Salgo60 added a subscriber: Fnielsen.
Salgo60 updated the task description. (Show Details)

Test URL with the query below

PREFIX swpa_m: <http://swepub.kb.se/SwePubAnalysis/model#>
SELECT DISTINCT ?_onetitle as ?title ?_issn as ?printISN ?_eissn as ?electronicISSN xsd:int(?_weight) as ?NorwegianLevel xsd:int(?_weight7) as ?FinnishLevel xsd:int(?_weight8) as ?DanishLevel ?SwedishLevel
WHERE (
 ?Journal a swpa_m:Journal .
 ?Journal swpa_m:onetitle ?_onetitle .
 OPTIONAL { ?Journal swpa_m:eissn ?_eissn . }
 OPTIONAL { ?Journal swpa_m:issn ?_issn . }
 FILTER ( bound(?_issn) || bound(?_eissn) )
 ?Journal swpa_m:hasRank ?SwedishRank .
 ?SwedishRank a swpa_m:SwedishRank .
 ?SwedishRank swpa_m:weight ?SwedishLevel . 
 OPTIONAL {
  ?Journal swpa_m:hasRank ?NorwegianRank .
  ?NorwegianRank a swpa_m:NorwegianRank .
  ?NorwegianRank swpa_m:weight ?_weight .
  }
OPTIONAL {
   ?Journal swpa_m:hasRank ?FinnishRank .
   ?FinnishRank a swpa_m:FinnishRank .
   ?FinnishRank swpa_m:weight ?_weight7 .} 
OPTIONAL {
   ?Journal swpa_m:hasRank ?DanishRank .
   ?DanishRank a swpa_m:DanishRank .
   ?DanishRank swpa_m:weight ?_weight8 .}
 }
 LIMIT 100000&format=text/html&timeout=0&debug=on

image.png (595×787 px, 116 KB)

#KB Auktoritetslista över organisationer
PREFIX swpa_m: http://swepub.kb.se/SwePubAnalysis/model#
PREFIX countries: http://www.bpiresearch.com/BPMO/2004/03/03/cdl/Countries#
SELECT DISTINCT
xsd:string(?_label) as ?organization
xsd:string(?_authority) as ?authority
xsd:string(?_id) as ?id
xsd:string(?_nameLocal) as ?country
xsd:string(?_countryCodeISO3166Alpha3) as ?countryCodeISO3166Alpha3
WHERE
{
?ResearchOrganization a swpa_m:ResearchOrganization .
?ResearchOrganization rdfs:label ?_label .
?ResearchOrganization swpa_m:hasIdentity ?Identity .
?ResearchOrganization swpa_m:locatedIn ?ISO3166DefinedCountry .
?Identity swpa_m:authority ?_authority .
?ISO3166DefinedCountry countries:countryCodeISO3166Alpha3 ?_countryCodeISO3166Alpha3 .
?ISO3166DefinedCountry countries:referencesCountry ?IndependentState .
?IndependentState countries:nameLocal ?_nameLocal .
?Identity swpa_m:id ?_id .
?Identity swpa_m:authority "kb.se"^^xsd:string .
FILTER(?_authority = "kb.se"^^xsd:string)
}

#KB Auktoritetslista över organisationer
PREFIX swpa_m: <http://swepub.kb.se/SwePubAnalysis/model#>
PREFIX countries: <http://www.bpiresearch.com/BPMO/2004/03/03/cdl/Countries#>
SELECT DISTINCT
xsd:string(?_label) as ?organization
xsd:string(?_authority) as ?authority
xsd:string(?_id) as ?id
xsd:string(?_nameLocal) as ?country
xsd:string(?_countryCodeISO3166Alpha3) as ?countryCodeISO3166Alpha3
WHERE
{
?ResearchOrganization a swpa_m:ResearchOrganization .
?ResearchOrganization rdfs:label ?_label .
?ResearchOrganization swpa_m:hasIdentity ?Identity .
?ResearchOrganization swpa_m:locatedIn ?ISO3166DefinedCountry .
?Identity swpa_m:authority ?_authority .
?ISO3166DefinedCountry countries:countryCodeISO3166Alpha3 ?_countryCodeISO3166Alpha3 .
?ISO3166DefinedCountry countries:referencesCountry ?IndependentState .
?IndependentState countries:nameLocal ?_nameLocal .
?Identity swpa_m:id ?_id .
?Identity swpa_m:authority "kb.se"^^xsd:string .
FILTER(?_authority = "kb.se"^^xsd:string)
}
PREFIX bmc: <http://swepub.kb.se/bibliometric/model#> 
PREFIX swpa_m: <http://swepub.kb.se/SwePubAnalysis/model#>
PREFIX mods_m: <http://swepub.kb.se/mods/model#> 
PREFIX outt_m: <http://swepub.kb.se/SwePubAnalysis/OutputTypes/model#>
PREFIX xlink: <http://www.w3.org/1999/xlink#> 
SELECT DISTINCT

xsd:string(?_orgName)
COUNT(DISTINCT ?_workID) as ?c
WHERE
{
?CreativeWork bmc:localID ?_workID .
?Publication bmc:localID ?_publicationID .

?Organization rdfs:label ?_orgName .
FILTER(lang(?_orgName) = 'sv' )

?CreativeWork bmc:reportedBy ?Record .
?CreativeWork a bmc:CreativeWork .
?CreativeWork bmc:publishedAs ?Publication .

#?CreativeWork bmc:publicationYearEarliest ?_pubYear .
?CreativeWork bmc:hasCreatorShip ?CreatorShip .
?CreatorShip bmc:hasAffiliation ?CreatorAffiliation .
?CreatorAffiliation bmc:hasOrganization ?Organization .

?CreatorShip bmc:hasCreator ?Creator . 


}
ORDER BY xsd:string(?_orgName)
Salgo60 updated the task description. (Show Details)
Salgo60 updated the task description. (Show Details)
Salgo60 updated the task description. (Show Details)

@Salgo60 what are you trying to do here?

The SWEPUB SPARQL endpoint works great without auth and the data is indeed CC0...

@Abbe98 ok do you have a link stating CC-0 of the meta data? I guess that is what we need... to start the process of setting up a SPARQL federation in WIkidata

Todo list as I see it

image.png (74×1 px, 22 KB)

  • wait until this is approved and set up in WIkidata --> it will appear on page SPARQL_Federation_endpoints
  • understand how we write query between Wikidata and Swepub I guess we can use DOI and ORCID

What are the benefits of Wikidata <-> SWEPUB

I guess both will benefit. I was at SWEPUB userday 2019 and missed a holistic vision compare the Bologna workshop on Open Citations (hashtag #WOOC2018) and the presentation by Diego Chialva ERCEA (European Research Council Executive Agency) link presentation and link video on slide 18 is Wikibase(i.e. Wikidata) mentioned as the tool they speak about using,

At SWEPUB my feeling is that they are struggling with basic things like getting Swedish research organizations getting money to report back adding categories to classify the research so some evaluation can be done in Sweden.... no speaking about linked data or no more advanced concepts did I hear.... I feels I see the same problems/patterns with SILOS we see with cultural databases and museums.... maybe European Research Council could be the driver..... maybe I am wrong as I have no skills/experience in this area

Task {T216409: Nobelprize as part of evaluating research in different countries} is also a spin off from that Bologna workshop on Open Citations ....

image.png (637×1 px, 190 KB)

Added value for SWEPUB is I guess that Wikidata/Wikipedia

  1. Has articles about a lot of researchers in + 300 languages
  2. Wikidata has a rich property model -->
    1. you have +100 Authority property information
    2. you have just for Academics/researchers +17 properties e.g. ORCID, ACM Digital Library author ID, Academic Tree ID (P2381) , CONICET person ID, DBLP ID (P2456), Dialnet author ID (P1607), Dictionary Grierson ID (P3946), Google Scholar författar ID (P1960), IPNI author ID (P586), MTMT author ID (P2492), Persons ID på ResearchGate (P2038), ResearcherID (P1053).....
  3. lesson learned from T200668: Set up Nobel Data as federated search with Wikidata is that Wikidata is fast updating (not always correct but...) and also supports having sources/references for statements

Today project Scholia using Wikidata for creating Open citation Graphs is more than interesting. See tweet were we find Swedish female researches with most quoted science articles BUT no sv:WIkipedia article --> could be an indication that an sv:Wikipedia article should be written. I guess with Wikidata <-> Swepub Wikidata will get better quality of this data ...

result from a search in tweet

image.png (589×1 px, 148 KB)

DOI/ORCID

My understanding is that Wikidata has +460 000 researchers with ORCID P496 and if we check Scholia we have 16527450 objects with DOI P356

image.png (622×1 px, 98 KB)

As Swepub also have DOI and ORCID I guess they are a good candidates for "same as" and doing SPARQL federated queries

The future will tell ;-)

@Abbe98 how do you understand this answer

Is it a go no go for Wikidata <-> Swepub? My understanding is that they have a SPARQL db but has no experience in federation.... or share data....

image.png (225×672 px, 38 KB)

FYI: I started to look how we could better populate Wikidata information from other sources like Karolinska Institutet tweet

Interesting to read page Altmetrics – support for science communication

and the WIkipedia section....

image.png (318×772 px, 57 KB)

I started to connect researchers youtube video at Karolinska with the youtube presentation Karolinska has

report

I do still not see the problem you can do Federated SPARQL without the endpoint being CC0.

@Abbe98 yes but inside Wikidata they say on Wikidata:SPARQL_federation_input

image.png (51×1 px, 20 KB)

so I guess best is like with LIBRIS XL a CC0 statement see T200191: Document what license Libris XL data is available under . But we need a license statement from the SWEPUB people.... the latest bid is they redesign the SWEPUB solution ?!?!?!.... ==> I moved this task to Blocked ( best would be if they start using Wikibase... ;-) )

As a person just trying to touch CC0 things I can see other requests denied like "Persée ... a digital library of open access, mostly French-language scholarly journals" with the motivation

the license seems to say "strictly for non-commercial purposes", and since we can not ensure that WDQS queries would not be used for commercial purposes, this may create a licensing problem.

my understanding is as long as the SWEPUB cant point on a license statement for the SWEPUB data they have then Wikidata is not interested in setting up SPARQL federation with them ==> so this task is stalled...