Page MenuHomePhabricator

create instance for people to toy with the query endpoint
Closed, ResolvedPublic

Description

Create instance for people to toy with the query endpoint. This is first for the Lyon hackathon. It doesn't need to be hardened, we can just restart it when someone manages to break it. It would be nice if it were read only, so that people don't accidentally drop the DB, see T91817. Then we could just make it public and only tell the URL to people in Lyon, so they don't have any hassle with ssh forwarding.

Event Timeline

JanZerebecki raised the priority of this task from to High.
JanZerebecki updated the task description. (Show Details)
JanZerebecki subscribed.

Some questions for this one:

  1. Do we want specialized GUI or Blazegraph workbench? Note that since we're making this instance read-only, most of the screens in the workbench would be useless. It wouldn't be super-hard to create a very simple GUI to just do queries and display results.
  1. We have an option to load full data or limit the labels to one language and omit the sitelinks. Omitting the sitelinks makes the DB smaller and loading/updating much faster. Same for one language, plus makes querying with labels much easier - if you have all labels, you'll get separate result for each label, so you either get tons of duplicates or you need to filter labels by language. Eventually we'll have a function for that (T97079) but for now it may be easier to go with one language. Wanted to hear your opinion on this.

1: If you/someone can do a simple gui that'd be great.
2: I think it is fine for now but I'm CCing a few people who might want to work with the demo at the hackathon to see if they'd absolutely need it.

re omitting sitelinks: probably fine for most use cases, but there are probably some high profile cases that do need sitelinks.

re single language: I'm afraid that would give the wrong impression, showcasing the multi-lingual nature of wikibase is important. If I understand correctly, no duplication will take place if you don't ask for labels in the output, right?

You're running this on labs right? Just filter to only allow private ip space and that way only people on labs can access (and break) your instance. Whatever software is this based on? Any documentation? Any nice client libraries (python?) we can (ab)use?

Pywikibot already has wdq based query generators. We could do something like that with this query service.

@Multichill yes, it's in labs. I'll add the external URL soon (probably by tomorrow). This is based on Blazegraph. For the docs you can see https://github.com/wikimedia/wikidata-query-rdf/tree/master/docs. No clients so far, though it has REST API that you can query by sending it SPARQL.

I'll make a simple GUI client (Blazegraph has GUI but it does much more than just querying so I don't want to expose it for external people because most of it would be useless for them).

If you want, we can also make a short meeting pre-hackaton (say, thursday morning PDT) to make a quick walkthrough about the setup to whoever is interested.

@daniel: true, if you ask for no labels, no duplication happens. It can be a bit boring without labels though :) We can make the Q/P URLs links to wikidata of course but still it may be a bit dry. It's not hard to write label limits (see e.g. https://github.com/wikimedia/wikidata-query-rdf/blob/master/docs/sparql-query-examples.md#who-discovered-the-most-asteroids) but can be annoying and slows things down a bit.

So check out this: http://wdqs-beta.wmflabs.org/ (or http://tinyurl.com/ld667eb
)

For extra cool (courtesy of @Jdouglas) click on * next to some entities. The data behind that (unlike queries ;) come from api.php now but I plan to make them SPARQL-backed too eventually.

Please check out and see if anything breaks. Queries are now limited by 30 sec runtime, if it's not enough I can make it longer.