Page MenuHomePhabricator

Example queries for Quarry
Closed, InvalidPublic

Description

As with the Wikidata Query Service, there should be a curated set of sample queries that should off the capabilities of Quarry.

Related Objects

Event Timeline

Harej created this task.Oct 15 2018, 10:39 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 15 2018, 10:39 PM

Opinions, please! :)

Is https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Example_queries sufficient? If not, any ideas what kind of queries (well, results, not the queries themselves) could be good samples?
And could writing/documenting some queries even be Google-Code-in-2018 tasks to mentor in the next five weeks, potentially? (A bit similar to https://phabricator.wikimedia.org/T193465 ?)

Hm. Ok, here's my take. We're about to release a new dataset, of denormalized monthly data for all wikis. It's going to revolutionize how some questions can be asked. For example, questions about reverted revisions, change in bytes from one revision to the next, editing sessions, etc. So when that's released, we'll need to explain it very carefully to people so they know what queries to use the mediawiki replicas for and what queries to use the analytics dataset for. I think a collection of well-written example queries is the best way to accomplish this. So, I'm interested in what happens here, and looking forward to adding my own efforts once we launch that data, early next year.

Base added a subscriber: Base.Nov 7 2018, 3:40 PM

Well, unlike the Wikidata Service all queries run by others are visible. I tink it is better to work towards search and facilitation of better query description than this.

Framawiki added a comment.EditedNov 7 2018, 5:48 PM

Hello, my opinion:
Quarry is "just" a client of the "replicas" databases. So I don't see why only Quarry should benefit from query examples.
If new dataset will be released examples are welcome on https://wikitech.wikimedia.org/wiki/Help:MySQL_queries, "official" example page, not on Quarry only. It can also be an option to move all these examples on Quarry which I would find positive (in particular to be able to see the results obtained and to be able to easily forge). Or allow you to easily create new queries from the example page, via a link creating a new pre-filled query, like what wdsparql purposes? These questions seem to fall outside the scope of this task.
Also I wonder if this new data will be available on the replicas? Because in any case Quarry only allows replicas queries.

bd808 added a subscriber: bd808.Nov 7 2018, 6:09 PM

Also I wonder if this new data will be available on the replicas? Because in any case Quarry only allows replicas queries.

The new data that @Milimetric is hinting at will be portions of the "data lake" which are 100% safe for sharing with the public. These tables will not be located on the Wiki Replica servers, but will be queryable using SQL from some other service. More details on how that will work practically will be discovered and documented as the project progresses.

For this new data set to be exposed via Quarry, T76466: Add database selector or similar work will need to be done so that a Quarry query can tell the backend Quarry workers which data storage system to connect to. My current understanding is that the Wikimedia Foundation's Analytics team will be doing/helping with that needed change as part of their larger plans to give Cloud Services/Toolforge users access to the public data lake.

Framawiki changed the task status from Open to Stalled.EditedNov 11 2018, 2:33 PM
Framawiki moved this task from Feature request to Backlog on the Quarry board.

Thanks for the explanation.
Quarry only allows replicas queries. If new datas will not be on these servers this task can't be solved.
Feel free to fill a new task with that tag when new servers will be available so we can see how to add them in Quarry.
Note that T76466: Add database selector was about adding a selector between wiki databases only, not servers.

Framawiki closed this task as Invalid.Feb 18 2019, 7:59 PM

There can be no example of use in Quarry access to new data is open, and the use of new servers is made in the tool. I would like to close this stain in this sense, for now.