As with the Wikidata Query Service, there should be a curated set of sample queries that should off the capabilities of Quarry.
Opinions, please! :)
Is https://wikitech.wikimedia.org/wiki/Help:MySQL_queries#Example_queries sufficient? If not, any ideas what kind of queries (well, results, not the queries themselves) could be good samples?
And could writing/documenting some queries even be Google-Code-in-2018 tasks to mentor in the next five weeks, potentially? (A bit similar to https://phabricator.wikimedia.org/T193465 ?)
Hm. Ok, here's my take. We're about to release a new dataset, of denormalized monthly data for all wikis. It's going to revolutionize how some questions can be asked. For example, questions about reverted revisions, change in bytes from one revision to the next, editing sessions, etc. So when that's released, we'll need to explain it very carefully to people so they know what queries to use the mediawiki replicas for and what queries to use the analytics dataset for. I think a collection of well-written example queries is the best way to accomplish this. So, I'm interested in what happens here, and looking forward to adding my own efforts once we launch that data, early next year.
Hello, my opinion:
Quarry is "just" a client of the "replicas" databases. So I don't see why only Quarry should benefit from query examples.
If new dataset will be released examples are welcome on https://wikitech.wikimedia.org/wiki/Help:MySQL_queries, "official" example page, not on Quarry only. It can also be an option to move all these examples on Quarry which I would find positive (in particular to be able to see the results obtained and to be able to easily forge). Or allow you to easily create new queries from the example page, via a link creating a new pre-filled query, like what wdsparql purposes? These questions seem to fall outside the scope of this task.
Also I wonder if this new data will be available on the replicas? Because in any case Quarry only allows replicas queries.
The new data that @Milimetric is hinting at will be portions of the "data lake" which are 100% safe for sharing with the public. These tables will not be located on the Wiki Replica servers, but will be queryable using SQL from some other service. More details on how that will work practically will be discovered and documented as the project progresses.
For this new data set to be exposed via Quarry, T76466: Add database selector or similar work will need to be done so that a Quarry query can tell the backend Quarry workers which data storage system to connect to. My current understanding is that the Wikimedia Foundation's Analytics team will be doing/helping with that needed change as part of their larger plans to give Cloud Services/Toolforge users access to the public data lake.
Thanks for the explanation.
Quarry only allows replicas queries. If new datas will not be on these servers this task can't be solved.
Feel free to fill a new task with that tag when new servers will be available so we can see how to add them in Quarry.
Note that T76466: Add database selector was about adding a selector between wiki databases only, not servers.