Unlike Wikidata Query Service, authentication is required.
There is a manual on how to connect: https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint.
Currently, human intervention is required, and we may wait for better support.
Description
Description
| Status | Subtype | Assigned | Task | ||
|---|---|---|---|---|---|
| Open | Feature | None | T223820 Properly implement structured data access on Commons in Pywikibot | ||
| Open | None | T326762 Support Wikimedia Commons Query Service |
Event Timeline
Comment Actions
@matej_suchanek Is the https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint outdated as pywikibot login seems to be enough?
Ie. this works:
import pywikibot
import json
from pywikibot.data import sparql
# Login to pywikibot
site = pywikibot.Site('commons', 'commons')
site.login()
# Define the SPARQL query
query = """
SELECT ?item ?described_url WHERE {
?item wdt:P7482 wd:Q74228490 . # P7482 (source of file) = Q74228490 (file available on the internet)
?item p:P7482 ?statement .
?statement pq:P973 ?described_url.
} LIMIT 5
"""
# Set up the SPARQL endpoint and entity URL
# Note: https://commons-query.wikimedia.org requires user to be logged in
entity_url = 'https://commons.wikimedia.org/entity/'
endpoint = 'https://commons-query.wikimedia.org/sparql'
# Create a SparqlQuery object
query_object = sparql.SparqlQuery(endpoint= endpoint, entity_url= entity_url)
# Execute the SPARQL query and retrieve the data
data = query_object.select(query, full_data=True)
# Convert SPARQL result mediainfo uri to Pywikibot.Page() object
for row in data:
page_id=int(row['item'].getID().replace('M',''))
pages = list(site.load_pages_from_pageids([page_id]))
if len(pages) == 1:
page=pages[0]
print(page)Comment Actions
Works for me, provided the bot account has authenticated to the endpoint (in the browser).
I tried using WikidataSPARQLPageGenerator, but it doesn't work.
>>> import pywikibot >>> site = pywikibot.Site('commons') >>> repo = site.data_repository() >>> repo DataSite("wikidata", "wikidata")
The query gets redirected to Wikidata.
Comment Actions
There is bug/missing code in WikidataSPARQLPageGenerator. It works if the entity_url and endpoint is defined in parameters
import pywikibot
import pywikibot.pagegenerators as pg
query = f"""
SELECT ?item ?value WHERE {{
?item wdt:P9478 ?value.
}} LIMIT 5
"""
site = pywikibot.Site('commons', 'commons')
site.login()
endpoint='https://commons-query.wikimedia.org/sparql'
entity_url='https://commons.wikimedia.org/entity/'
generator = pg.WikidataSPARQLPageGenerator(query, endpoint=endpoint, site=site, entity_url=entity_url)
for item in generator:
page_id=item.getID(numeric=True)
result = list(site.load_pages_from_pageids([page_id]))
print(result)
print(page_id)Comment Actions
Note, sparql query still sometimes gets a result from sparql endpoint where the result is in html and it says that oauth authorization is required. Pywikibots sparql query function silently fails and returns None