Page MenuHomePhabricator

Support Wikimedia Commons Query Service
Open, Needs TriagePublic

Description

Unlike Wikidata Query Service, authentication is required.
There is a manual on how to connect: https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint.
Currently, human intervention is required, and we may wait for better support.

Event Timeline

@matej_suchanek Is the https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint outdated as pywikibot login seems to be enough?

Ie. this works:

import pywikibot
import json
from pywikibot.data import sparql

# Login to pywikibot    
site = pywikibot.Site('commons', 'commons')
site.login()

# Define the SPARQL query
query = """
        
SELECT ?item ?described_url WHERE {
  ?item wdt:P7482 wd:Q74228490 .       # P7482 (source of file) = Q74228490 (file available on the internet)
  ?item p:P7482 ?statement .            
  ?statement pq:P973 ?described_url.
} LIMIT 5
"""
                
# Set up the SPARQL endpoint and entity URL
# Note: https://commons-query.wikimedia.org requires user to be logged in
                        
entity_url = 'https://commons.wikimedia.org/entity/'
endpoint = 'https://commons-query.wikimedia.org/sparql'
                        
# Create a SparqlQuery object
query_object = sparql.SparqlQuery(endpoint= endpoint, entity_url= entity_url)

# Execute the SPARQL query and retrieve the data
data = query_object.select(query, full_data=True)

# Convert SPARQL result mediainfo uri to Pywikibot.Page() object
for row in data:
    page_id=int(row['item'].getID().replace('M',''))
    pages = list(site.load_pages_from_pageids([page_id]))
    if len(pages) == 1:
        page=pages[0]
        print(page)

Works for me, provided the bot account has authenticated to the endpoint (in the browser).

I tried using WikidataSPARQLPageGenerator, but it doesn't work.

>>> import pywikibot
>>> site = pywikibot.Site('commons')
>>> repo = site.data_repository()
>>> repo
DataSite("wikidata", "wikidata")

The query gets redirected to Wikidata.

There is bug/missing code in WikidataSPARQLPageGenerator. It works if the entity_url and endpoint is defined in parameters

import pywikibot
import pywikibot.pagegenerators as pg

query = f"""
        SELECT ?item ?value WHERE {{
            ?item wdt:P9478 ?value. 
        }} LIMIT 5
"""
   
site = pywikibot.Site('commons', 'commons')
site.login()

endpoint='https://commons-query.wikimedia.org/sparql'
entity_url='https://commons.wikimedia.org/entity/'

generator = pg.WikidataSPARQLPageGenerator(query, endpoint=endpoint, site=site, entity_url=entity_url)

for item in generator:
    page_id=item.getID(numeric=True)
    result = list(site.load_pages_from_pageids([page_id]))
    print(result)
    print(page_id)

Note, sparql query still sometimes gets a result from sparql endpoint where the result is in html and it says that oauth authorization is required. Pywikibots sparql query function silently fails and returns None