Page MenuHomePhabricator

Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry but for SPARQL)
Open, LowPublic

Description

Quarry (https://quarry.wmflabs.org/) is a web service where people can make SQL queries and share these queries and the result. It's a really nice service to get to know SQL.

Since ~2015 we have the Wikidata Query Service (https://query.wikidata.org/) which uses SPARQL. Not a lot of people know a lot of SPARQL so having some sort of service like Quarry, but for SPARQL, would make it a lot easier to use this service.

Proposed name is sparqly, but we can always bike shed over a better name.

Event Timeline

Multichill raised the priority of this task from to Needs Triage.
Multichill updated the task description. (Show Details)
Multichill added subscribers: Multichill, Nikki.
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I think it'll be simpler to just have Quarry handle SPARQL as well.

I think it'll be simpler to just have Quarry handle SPARQL as well.

Fine with me too :-)

A stupid question by a person with bad SQL (progressed from almost zero mostly thanks to quarry) and almost no SPARQL knowledge: would it be possible to have part of a request being made in SQL and part in SPARQL and have it all output as one table? (Like select a list of articles on some wiki by some WD based criteria and then fetch its sizes, creators and stuff like this via SQL and just have it all in one table)

A stupid question by a person with bad SQL (progressed from almost zero mostly thanks to quarry) and almost no SPARQL knowledge: would it be possible to have part of a request being made in SQL and part in SPARQL and have it all output as one table? (Like select a list of articles on some wiki by some WD based criteria and then fetch its sizes, creators and stuff like this via SQL and just have it all in one table)

Quite off topic on this bug. Another forum like the wikidata mailinglist is probably more suitable. Have a look at https://petscan.wmflabs.org/ . With that tool you can combine queries from different sources.

A stupid question by a person with bad SQL (progressed from almost zero mostly thanks to quarry) and almost no SPARQL knowledge: would it be possible to have part of a request being made in SQL and part in SPARQL and have it all output as one table? (Like select a list of articles on some wiki by some WD based criteria and then fetch its sizes, creators and stuff like this via SQL and just have it all in one table)

Quite off topic on this bug. Another forum like the wikidata mailinglist is probably more suitable. Have a look at https://petscan.wmflabs.org/ . With that tool you can combine queries from different sources.

More simply I guess that means the answer is no. Thanks, that's what I wanted to hear :)

With the current SPARQL setup it's easy to share queries either by full url or by short url. I think we can close this one.

Do I get it right that now a query cannot be longer than URL length limit? How much exactly is that number? I wonder if there were cases of people needing to run longer queries. Is this investigable somehow?

@Base, your questions are very interesting, and you seem to have really nice suggestions, but I would suggest a mailing list, wiki talk page (or if it was a bug/feature request, doing them on a separate ticket), as the preferred way to communicate.

This ticket is probably going to be closed soon, and when that happens your questions will get unanswered and with little visibility here.

With the current SPARQL setup it's easy to share queries either by full url or by short url. I think we can close this one.

I disagree: one important part of this task, saving results, isn’t served at all by this. We want to be able to save query results and share them, and unlike on Quarry, it shouldn’t be possible to change those results later, even for the query author (who, on Quarry, can re-run the query, changing the results without assigning a new ID). Other than when privacy or legal concerns require the results to be deleted, the pages should be immutable.

This should be an optional component, not the main interface for querying (as Quarry is for the SQL databases) – WDQS sees millions of queries every day (the exact number varies with each new Wikidata presentation), we can’t afford to save all those results.

We want to be able to save query results and share them

Tabular data on Commons should be a good place for it, not? Do we need yet another place/way to store tabular data?

Tabular data on Commons should be a good place for it, not? Do we need yet another place/way to store tabular data?

That seems like a good option indeed. In that case, we'd need a way to pull the data back into the WDQS for visualization.

IIRC https://www.mediawiki.org/wiki/Extension:Graph can work with tabular data. WDQS GUI can't export into graphs though, except for Graph Builder, so there's some improvement possible there.

I don’t think that’s a good fit. Query results aren’t necessarily notable for Commons, nor are they necessarily pure data (e. g. labels and descriptions, image links, or constructed result columns – the most extreme example would be the “cocktail recipes” query). Commons’ Tabular Data also imposes some restrictions which not all query results fulfill (e. g. strings cannot be longer than 400 characters), and unless we store tiny JSON blobs like {"type": "literal", "value": "foo"} inside the string values in the tabular data (storing objects directly is not allowed), we also lose some information about the data (the distinction between literals and IRIs).

Not exactly Quarry, but see https://commons.wikimedia.org/wiki/User:TabulistBot - this should be similar to Listeria and generate persistent reusable tabular data.

@Lucas_Werkmeister_WMDE I agree there are some downsides to this model, but I think it's the easiest and most natural to do for now despite the limitations, so I'd like to see if it can work with it.

With the current SPARQL setup it's easy to share queries either by full url or by short url. I think we can close this one.

I disagree: one important part of this task, saving results, isn’t served at all by this. We want to be able to save query results and share them, and unlike on Quarry, it shouldn’t be possible to change those results later, even for the query author (who, on Quarry, can re-run the query, changing the results without assigning a new ID). Other than when privacy or legal concerns require the results to be deleted, the pages should be immutable.

+1

valerio.bozzolan renamed this task from Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry) to Setup sparqly service at https://sparqly.wmflabs.org/ (like Quarry but for SPARQL).Aug 30 2022, 8:02 AM
valerio.bozzolan updated the task description. (Show Details)

The killer feature of this tool would be: less timeouts.

Lot of users have very interesting queries, that sometime cannot just be optimized more and just require more resources. Having said I understand we cannot just increase resources for every anonymous execution in the world (lot of people could abuse it) I think it could be possible to use this Task to target a fix for this issue, since this tool will have a queue, and there is no parallel execution.