Unlike Wikidata Query Service, authentication is required.
There is a manual on how to connect: https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint.
Currently, human intervention is required, and we may wait for better support.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | Feature | None | T223820 Properly implement structured data access on Commons in Pywikibot | ||
Open | None | T326762 Support Wikimedia Commons Query Service |
Event Timeline
Comment Actions
@matej_suchanek Is the https://commons.wikimedia.org/wiki/Commons:SPARQL_query_service/API_endpoint outdated as pywikibot login seems to be enough?
Ie. this works:
import pywikibot import json from pywikibot.data import sparql # Login to pywikibot site = pywikibot.Site('commons', 'commons') site.login() # Define the SPARQL query query = """ SELECT ?item ?described_url WHERE { ?item wdt:P7482 wd:Q74228490 . # P7482 (source of file) = Q74228490 (file available on the internet) ?item p:P7482 ?statement . ?statement pq:P973 ?described_url. } LIMIT 5 """ # Set up the SPARQL endpoint and entity URL # Note: https://commons-query.wikimedia.org requires user to be logged in entity_url = 'https://commons.wikimedia.org/entity/' endpoint = 'https://commons-query.wikimedia.org/sparql' # Create a SparqlQuery object query_object = sparql.SparqlQuery(endpoint= endpoint, entity_url= entity_url) # Execute the SPARQL query and retrieve the data data = query_object.select(query, full_data=True) # Convert SPARQL result mediainfo uri to Pywikibot.Page() object for row in data: page_id=int(row['item'].getID().replace('M','')) pages = list(site.load_pages_from_pageids([page_id])) if len(pages) == 1: page=pages[0] print(page)
Comment Actions
Works for me, provided the bot account has authenticated to the endpoint (in the browser).
I tried using WikidataSPARQLPageGenerator, but it doesn't work.
>>> import pywikibot >>> site = pywikibot.Site('commons') >>> repo = site.data_repository() >>> repo DataSite("wikidata", "wikidata")
The query gets redirected to Wikidata.
Comment Actions
There is bug/missing code in WikidataSPARQLPageGenerator. It works if the entity_url and endpoint is defined in parameters
import pywikibot import pywikibot.pagegenerators as pg query = f""" SELECT ?item ?value WHERE {{ ?item wdt:P9478 ?value. }} LIMIT 5 """ site = pywikibot.Site('commons', 'commons') site.login() endpoint='https://commons-query.wikimedia.org/sparql' entity_url='https://commons.wikimedia.org/entity/' generator = pg.WikidataSPARQLPageGenerator(query, endpoint=endpoint, site=site, entity_url=entity_url) for item in generator: page_id=item.getID(numeric=True) result = list(site.load_pages_from_pageids([page_id])) print(result) print(page_id)
Comment Actions
Note, sparql query still sometimes gets a result from sparql endpoint where the result is in html and it says that oauth authorization is required. Pywikibots sparql query function silently fails and returns None