Support Wikimedia Commons Query Service
Open, Needs TriagePublic
Actions

Assigned To

None

Authored By

	matej_suchanek
	Jan 11 2023, 9:57 PM

Description

Unlike Wikidata Query Service, authentication is required.
There is a manual on how to connect: https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint.
Currently, human intervention is required, and we may wait for better support.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Open	Feature	None	T223820 Properly implement structured data access on Commons in Pywikibot
		Open		None	T326762 Support Wikimedia Commons Query Service

Event Timeline

matej_suchanek created this task.Jan 11 2023, 9:57 PM

Restricted Application added subscribers: pywikibot-bugs-list, Aklapper. · View Herald TranscriptJan 11 2023, 9:57 PM

@matej_suchanek Is the https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint outdated as pywikibot login seems to be enough?

Ie. this works:

import pywikibot
import json
from pywikibot.data import sparql

# Login to pywikibot    
site = pywikibot.Site('commons', 'commons')
site.login()

# Define the SPARQL query
query = """
        
SELECT ?item ?described_url WHERE {
  ?item wdt:P7482 wd:Q74228490 .       # P7482 (source of file) = Q74228490 (file available on the internet)
  ?item p:P7482 ?statement .            
  ?statement pq:P973 ?described_url.
} LIMIT 5
"""
                
# Set up the SPARQL endpoint and entity URL
# Note: https://commons-query.wikimedia.org requires user to be logged in
                        
entity_url = 'https://commons.wikimedia.org/entity/'
endpoint = 'https://commons-query.wikimedia.org/sparql'
                        
# Create a SparqlQuery object
query_object = sparql.SparqlQuery(endpoint= endpoint, entity_url= entity_url)

# Execute the SPARQL query and retrieve the data
data = query_object.select(query, full_data=True)

# Convert SPARQL result mediainfo uri to Pywikibot.Page() object
for row in data:
    page_id=int(row['item'].getID().replace('M',''))
    pages = list(site.load_pages_from_pageids([page_id]))
    if len(pages) == 1:
        page=pages[0]
        print(page)

Works for me, provided the bot account has authenticated to the endpoint (in the browser).

I tried using WikidataSPARQLPageGenerator, but it doesn't work.

>>> import pywikibot
>>> site = pywikibot.Site('commons')
>>> repo = site.data_repository()
>>> repo
DataSite("wikidata", "wikidata")

The query gets redirected to Wikidata.

There is bug/missing code in WikidataSPARQLPageGenerator. It works if the entity_url and endpoint is defined in parameters

import pywikibot
import pywikibot.pagegenerators as pg

query = f"""
        SELECT ?item ?value WHERE {{
            ?item wdt:P9478 ?value. 
        }} LIMIT 5
"""
   
site = pywikibot.Site('commons', 'commons')
site.login()

endpoint='https://commons-query.wikimedia.org/sparql'
entity_url='https://commons.wikimedia.org/entity/'

generator = pg.WikidataSPARQLPageGenerator(query, endpoint=endpoint, site=site, entity_url=entity_url)

for item in generator:
    page_id=item.getID(numeric=True)
    result = list(site.load_pages_from_pageids([page_id]))
    print(result)
    print(page_id)

Note, sparql query still sometimes gets a result from sparql endpoint where the result is in html and it says that oauth authorization is required. Pywikibots sparql query function silently fails and returns None

Support Wikimedia Commons Query ServiceOpen, Needs TriagePublicActions

Description

Related ObjectsSearch...

Event Timeline

Support Wikimedia Commons Query Service
Open, Needs TriagePublic
Actions

Related Objects
Search...