Page MenuHomePhabricator

Support GraphQL Queries across Wikimedia
Closed, ResolvedPublic

Description

Problem
Getting data for a single item either gives you a huge (~47KB) response:
https://www.wikidata.org/wiki/Special:EntityData/Q817266.json
This response also doesn't telll you what the ids represent. For instance there is no way to know what a P136 id represents without making another set of requests with huge responses.

or to query for properties with SPARQL on a single item, is a little complicated:

SELECT ?title (sample(?image) as ?singleImage)  (group_concat(?genreLabel; separator="; ") as ?genres) WHERE {
	wd:Q817266 rdfs:label ?title filter (lang(?title) = "en") .
	optional { wd:Q817266 wdt:P18 ?image . }
	optional { wd:Q817266 wdt:P136 ?genre . }
	service wikibase:label { bd:serviceParam wikibase:language "en". ?genre rdfs:label ?genreLabel. }
} group by ?title

I'm not sure this would return anything at all if the title is missing in English. And this is just three properties, it will get even more complicated with more properties. Also, it would be extremely difficult to get nested properties. For instance, if I wanted to get each genre's P136 image P19 I wouldn't be able to do that (at least not well) without a subsiquent query. And adding another item id to the query would make it even more complex (etc. etc.)

SPARQL is fantastic at doing what it does: making complex queries and getting a list of items back. But it's not very good at getting a bunch of nested properties on a single item.

It would be really helpful if you could request data about a single item (or multiple by id) and get back a subset of nested items.

Solution
I think it would be really helpful if Wikidata supported GraphQL (spec) which is used not only by Facebook, but is also now the exclusive method for GitHub's API.

The above SPARQL query could look like this (based on the work done in T173214#3865779):

{
  item(id: "Q817266") {
    label(language: "en") {
      text
    }
    images: statements(propertyIds: "P18", best: true) {
      ...StatementItemValue
    }
    genres: statements(propertyIds: "P136", best: true) {
      ...StatementItemValue
    }
  }
}

fragment StatementItemValue on Statement {
  data: mainsnak {
    ... on PropertyValueSnak {
      item: value {
        ... on StringValue {
          value
        }
        ... on Item {
          label(language: "en") {
            text
          }
        }
      }
    }
  }
}

which would give a repsonse like this:

{
  "data": {
    "item": {
      "label": {
        "text": "Easy A"
      },
      "images": [
        {
          "data": {
            "item": {
              "value": "Easy A.svg"
            }
          }
        }
      ],
      "genres": [
        {
          "data": {
            "item": {
              "label": {
                "text": "comedy film"
              }
            }
          }
        },
        {
          "data": {
            "item": {
              "label": {
                "text": "teen film"
              }
            }
          }
        },
        {
          "data": {
            "item": {
              "label": {
                "text": "LGBT-related film"
              }
            }
          }
        }
      ]
    }
  }
}

This makes the query a whole lot small and easier to do and GraphiQL also supports introspection as seen in T173214#3865779 so the whole thing is self-documenting.

The query can can get a lot more recursive data without a lot of additional complexity (example).

Implementation
This could either exist as a new query service (graphql.wikidata.org ?) or it could exist as a new extension, or it could be a part of wikibase repository.

Revisions and Commits

Event Timeline

I just started to implement a simple GraphQL wrapper on top of the Wikibase API in order to see how it could be implemented in practice. It currently maps most of the PHP DataModel structures with an interface similar to the one of the JSON API and provides some demo queries and mutations.

Here are some samples (click on the execute button to retrieve the execution of the query):

mutation mut {
  setLabel(input: {
    id: "Q90"
    language: "en"
    value: "Paris"
  }) {
  	clientMutationId
  }
}

Warning: there is no way to log-in yet so if you try to execute it is going to fail because labs IPs are blocked.

Implementation related points:

  • Source code is here: https://phabricator.wikimedia.org/source/tool-tptools/browse/master/ the interesting files are public_html/wdql.php (entry point) and src/GraphQL/*
  • I am using a GraphQL PHP library https://webonyx.github.io/graphql-php/ that is fairly stable and provides advanced features like query complexity limitations. If we want in the far future provide a GraphQL API in Wikibase it seems possible to use it with MediaWiki.
  • To make the data look more like a graph and less than a set of documents each time a field in the JSON API has for value an EntityId, an EntityIdValue or an entity URI the GraphQL API returns an Entity instead.
  • The type system reuses the naming used in https://www.mediawiki.org/wiki/Wikibase/DataModel with a top interface called Value that is extended by both Entity and the different DataValues.
  • Instead of having dictionnaries indexed on language codes, property ids or siteids, I provide an argument to the labels, descriptions, sitelinks, statements... fields that allows to do filtering. (Remark: GraphQL allows to call multiple time the same field).

TODO:

  • Do a lot of optimizations.
  • Implement more mutations
  • Implement more lookups (by sitelink, by property/value...).
  • Could GraphQL and Oauth play together?
  • If everybody think it's great: should we implement this API as part of Wikibase, an other MediaWiki extension or a specific Wikimedia service.

@Tpt so it looks like right now you can't get a datavalue or recursively call item from a statement. I added a sample query to the task description.

Otherwise, this is incredible and I'm really excited about it. It makes developing applications that use Wikidata a whole lot easier. Thank you for the work you've done so far!

@Tpt so it looks like right now you can't get a datavalue or recursively call item from a statement. I added a sample query to the task description.

It's actually possible:

Example with the string data type: https://tools.wmflabs.org/tptools/wdql.html?query=%7B%0A%20%20champollion%3A%20item(id%3A%20%22Q260%22)%20%7B%0A%20%20%20%20statements(propertyIds%3A%20%22P214%22)%20%7B%0A%20%20%20%20%20%20id%0A%20%20%20%20%20%20rank%0A%20%20%20%20%20%20mainsnak%20%7B%0A%20%20%20%20%20%20%20%20...%20on%20PropertyValueSnak%20%7B%0A%20%20%20%20%20%20%20%20%20%20property%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20id%0A%20%20%20%20%20%20%20%20%20%20%20%20datatype%0A%20%20%20%20%20%20%20%20%20%20%20%20label%3A%20label(language%3A%20%22en%22)%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20text%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20value%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20type%0A%20%20%20%20%20%20%20%20%20%20%20%20...%20on%20StringValue%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20value%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D%0A

Example with items: https://tools.wmflabs.org/tptools/wdql.html?query=%7B%0A%20%20champollion%3A%20item(id%3A%20%22Q260%22)%20%7B%0A%20%20%20%20statements(propertyIds%3A%20%22P31%22)%20%7B%0A%20%20%20%20%20%20id%0A%20%20%20%20%20%20rank%0A%20%20%20%20%20%20mainsnak%20%7B%0A%20%20%20%20%20%20%20%20...%20on%20PropertyValueSnak%20%7B%0A%20%20%20%20%20%20%20%20%20%20property%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20id%0A%20%20%20%20%20%20%20%20%20%20%20%20datatype%0A%20%20%20%20%20%20%20%20%20%20%20%20label%3A%20label(language%3A%20%22en%22)%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20text%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20value%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20type%0A%20%20%20%20%20%20%20%20%20%20%20%20...%20on%20Item%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20id%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20label%3A%20label(language%3A%20%22en%22)%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20text%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20references%20%7B%0A%20%20%20%20%20%20%20%20hash%0A%20%20%20%20%20%20%20%20snaks(propertyIds%3A%20%5B%22P143%22%5D)%20%7B%0A%20%20%20%20%20%20%20%20%20%20type%0A%20%20%20%20%20%20%20%20%20%20...%20on%20PropertyValueSnak%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20value%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20...%20on%20Entity%20%7B%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20id%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%20%20%7D%0A%20%20%20%20%20%20%7D%0A%20%20%20%20%7D%0A%20%20%7D%0A%7D%0A

These examples are using the GraphQL ... on Type feature. We could maybe avoid this dispatching for DataValues if we encode in the schema the data type for each Wikidata property. But it would have the strong disadvantage to have to rebuild the GraphQL schema each time a Wikibase property is added or removed or when its datatype changes. The GraphQL schema would also be different between Wikibase instances.

@Tpt so it looks like right now you can't get a datavalue or recursively call item from a statement. I added a sample query to the task description.

It's actually possible:

These examples are using the GraphQL ... on Type feature. We could maybe avoid this dispatching for DataValues if we encode in the schema the data type for each Wikidata property. But it would have the strong disadvantage to have to rebuild the GraphQL schema each time a Wikibase property is added or removed or when its datatype changes. The GraphQL schema would also be different between Wikibase instances.

OMG that is fantastic! No it makes complete sense why you did it that way, I wasn't expecting it to be like that, so it's slightly more complicated on the user, but I think the trade offs (as you described) are worse, so I think this is perfect (and now that I understand, it makes sense).

Thank you again for all your work. Is there any reason this can't be used where it is for now? (I mean ideally it would be in production somewhere).

Thank you!

Is there any reason this can't be used where it is for now? (I mean ideally it would be in production somewhere).

It could definitely be used now but without any stability guaranties. The GraphQL query resolution is a bit heavy because it makes a lot of requests to the API but for light usages it should be definitely possible to use it. I am still not happy with everything, especially the data values access that is indeed not very user friendly.

It could definitely be used now but without any stability guaranties. The GraphQL query resolution is a bit heavy because it makes a lot of requests to the API but for light usages it should be definitely possible to use it. I am still not happy with everything, especially the data values access that is indeed not very user friendly.

Yeah there might be some opportunities to simplify/flatten the access a bit. I'm not sure how this could be done other than making the properties themselves types? Though there might be way too many properties to do that, and as you mentioned, it would have to be rebuilt every time a property changes. Although, I kinda doubt that the type of a property changes very often (if at all?), so they could be cached for a long time and it could be done within an extension or within wikibase itself dynamically. I don't think it's a big deal that it would be different for every wikibase instance, the API (Special:EntityData & SPARQL) is already different for each instance of wikibase, so as long as the introspection works correctly, I think it's fine (or even preferred) if it doesn't try to be the same across instances.

I suppose if you were given the ability to access the values by property OR by base type (like you can now) it leaves the decision to tie requests to a specific instance of wikibase, or to make it more abstracted (and be able to be used with any instance). Leaving that up to the user I think is a good idea. :)

@Tpt Here's a patch to enable CORS on the graphql endpoint. :) D1081

dbarratt added a revision: Restricted Differential Revision.Jul 21 2018, 10:36 AM
dbarratt added a revision: Restricted Differential Revision.Jul 21 2018, 2:10 PM

Thank you again for all your work. Is there any reason this can't be used where it is for now? (I mean ideally it would be in production somewhere).

We wanted to see if there is enough demand for it before we commit to offering and maintaining it long-term.

We wanted to see if there is enough demand for it before we commit to offering and maintaining it long-term.

That makes total sense. Thanks! As long as it's safe to use in Toolforge, I can use it there. :) I also noticed that it just uses the API so if needed I can always clone it and run it somewhere else. :)

It would be awesome if there was a way to filter the results of a property by the qualifier, for instance on this query I'm only really interested in the publication date from a specific country (i.e. whatever country the user is located in). So it would be awesome if you could specify that somehow (by Q id?). But if not, it's not a huge deal to add it to the query and loop over the results and get the one I'm looking for.

Indeed the set of property and their datatypes is very static and cachable so we could use them as keys of a StatementByProperty object and then have StringStatement, StringSnak... types. The object would just be huge and maybe raise some performance problems of the various GraphQL tools (we would have 4K+ keys).

A simple simplification we could do is to add to the Snak interface a nulllable version of the "value" key and keep the not-nullable version to the PropertyValueSnak type. It would allow to avoid most of the ...on PropertyValueSnak checks. But it has the disadvantages of making the output less safe (more possible nulls) and do not highlighting the fact that some snak do not have a value.

For statement filtering by qualifier a good solution is maybe to add an extra parameter "hasStatement" to the statements field and give to it a value of type SnakInput (a GraphQL input type that would be used to encode a snak provided by the GraphQL client). What do you think about it?

Smalyshev subscribed.

Sounds very interesting but probably not part of any work for Wikidata-Query-Service, so I am removing the tag.

dbarratt renamed this task from Support GraphQL Queries to Support GraphQL Queries on Wikidata.Jul 28 2018, 11:35 PM
dbarratt added a project: Developer-Wishlist.

For statement filtering by qualifier a good solution is maybe to add an extra parameter "hasStatement" to the statements field and give to it a value of type SnakInput (a GraphQL input type that would be used to encode a snak provided by the GraphQL client). What do you think about it?

I think that should work? Sorry I'm not that familiar with the GraphQL server-side of things.

I've been thinking about Wikidata a lot, and asking "What is the reading experience supposed to be like?" and I though about data websites that I read a lot, and one came time mind: IMDb. While I was at Wikimania-Hackathon-2018 I started work (ok, I haven't done much) on a IMDb clone that uses wikidata:
running prototype: https://wikimdb.davidwbarratt.com/item/25188
code: https://github.com/davidbarratt/wikimdb

This is a trivial example, but I think the reading experience of wikidata is actually many reading experiences. There isn't a single "one size fits all" way to browse/consume data. Perhaps there's a website for browsing books, and another for birds, and another for places, etc.

I think that's where tasks like this come in... there needs to be an easy way to rapidly create reading experiences built on wikidata. I think GraphQL is perhaps, one of those tools.

I thought of some more features and I thought I'd document them here:

  • Filter the sitelinks by site (i.e. Wikipedia). Also being able to filter them by language would be awesome. For instance, I'm trying to use this to get the summary of the article in the user's language. It would be incredible if you could perform actions against the REST API within the query, but at that point, we might as well just add GraphQL to MediaWiki and allow Wikidata's GraphQL endpoint to call MediaWiki's. :)
  • Some sort of ability to specify a "fallback" language. For media, Property:P364 could be used. But perhaps more just the general ability to specify a fallback language by a property id? For instance, if a label is not available in a film, I'd prefer for it to show in it's original language (whatever that might be). I kinda wish that GraphQL had some sort of way to use values in a one query in another, but as far as I can tell you can't do that. :/
  • The Commons Media data type could allow a function for generating a thumbnail url (if the image is a bitmap).
  • Ordering the property values by a qualifier (for instance, being able to order by Property:P1545 or Property:P1352.

It's really awesome so far though!

For statement filtering by qualifier a good solution is maybe to add an extra parameter "hasStatement" to the statements field and give to it a value of type SnakInput (a GraphQL input type that would be used to encode a snak provided by the GraphQL client). What do you think about it?

So it looks like the proper way to do this is with a Directive. Then again, I'm not really sure I understand the difference between Directives and Arguments. Perhaps arguments can only be used on objects and Directives can be used on any type?

  • Some sort of ability to specify a "fallback" language. For media, Property:P364 could be used. But perhaps more just the general ability to specify a fallback language by a property id? For instance, if a label is not available in a film, I'd prefer for it to show in it's original language (whatever that might be). I kinda wish that GraphQL had some sort of way to use values in a one query in another, but as far as I can tell you can't do that. :/

On second thought... I suppose the proper way to do this would be to first query for the "fallback" and then on the main query request the labels in both languages (the user's language and the fallback language).

@Tpt

I've been making a somewhat complex query and I've run into a pretty big performance issue. The query takes about 12-13 seconds to execute. :(

I do have some time to work on this, so I'm really looking for what you think is the best option.

Here's some ideas I have on how to resolve this:

  1. Collect all of the concurrent requests and execute them simultaneously with Guzzle (or the like). This relies on curl_multi_exec() to execute the requests in parallel. The problem with doing this, is that the requests have to be "collected" on each level of the hierarchy and then the result must be put back where it was. I can't think of an easy way to do this with the GraphQL resolvers (unless the library can accept something like a Guzzle promise and do the collection for us?)
  2. Run Wikibase Client on Toolforge and use the Client to access Wikidata's database directly. I suppose the code would be moved into a MediaWiki extension so it could define the routes within MediaWiki. I don't know if this is actually possible to do. It doesn't bring concurrency, but it would speed up the requests substantially (to the point where concurrency is not needed). I'm not sure if Wikidata's repository is available in the replicas (I imagine it is?). This also doesn't fix anything with SPARQL (i.e. if you have multiple SPARQL queries they are not going to run concurrently), although, fetching the entities after the query would at least be quick. I suppose if there's a chance that this could one day operate on Wikidata then this is a good option, otherwise I don't really like it because it requires the software to be run on Toolforge (i.e. I can't just run the GraphQL server on my own server).
  3. Rewrite in JavaScript (node.js) and (use Apollo Server or the like). This would naturally allow the requests to run concurrently and asynchronously (i.e. a group of requests would be resolved individually rather than as a whole group). As a plus, all of the requests would continue to execute on production. This seems like it might be the most amount of work, but has the biggest amount of benefit. Also, if we wanted to run this as a production service, it could use the API over the local network. Or if someone wants to run it on their own server they'd be able to do that as well.

What do you think? I'm leaning towards Option #3 as it gives the most bang for the buck, but I wanted to make sure you were good with that option before I go rewriting everything. :)

@dbarratt Thank you for planning to work on Wikibase+GraphQL.

The performance problem we face seems very standard, I believe it is the N+1 problem.
The standard way to solve it is to use the DataLoader utility. The original Facebook implementation is here: https://github.com/facebook/dataloader but our PHP GraphQL library also provides an equivalent: https://webonyx.github.io/graphql-php/data-fetching/#solving-n1-problem

But indeed rewriting to JS is a good option if we do not plan to integrate the GraphQL server inside of MediaWiki. If not, the PHP library we use actually allows to use promises and does collection for us: https://webonyx.github.io/graphql-php/data-fetching/#async-php

If you start working significantly on the GraphQL endpoint we should probably move it out of tptools to a separated shared project.

For statement filtering by qualifier a good solution is maybe to add an extra parameter "hasStatement" to the statements field and give to it a value of type SnakInput (a GraphQL input type that would be used to encode a snak provided by the GraphQL client). What do you think about it?

So it looks like the proper way to do this is with a Directive. Then again, I'm not really sure I understand the difference between Directives and Arguments. Perhaps arguments can only be used on objects and Directives can be used on any type?

The value of the built-in directives @include and @skip are booleans, and, so, they are only useful in order to build generic queries that have options to enable or disable specific parts. Arguments could be complex type and are passed to the resolution code. So, it seems to me the most efficient way to filter based on qualifiers.

  • Some sort of ability to specify a "fallback" language. For media, Property:P364 could be used. But perhaps more just the general ability to specify a fallback language by a property id? For instance, if a label is not available in a film, I'd prefer for it to show in it's original language (whatever that might be). I kinda wish that GraphQL had some sort of way to use values in a one query in another, but as far as I can tell you can't do that. :/

On second thought... I suppose the proper way to do this would be to first query for the "fallback" and then on the main query request the labels in both languages (the user's language and the fallback language).

Yes, it seems indeed the easiest way to do it client side. But having proper language fallback would be very nice. We could add an option to the "label", "description"... fields to use the MediaWiki language fallback system.

@dbarratt Thank you for planning to work on Wikibase+GraphQL.

The performance problem we face seems very standard, I believe it is the N+1 problem.
The standard way to solve it is to use the DataLoader utility. The original Facebook implementation is here: https://github.com/facebook/dataloader but our PHP GraphQL library also provides an equivalent: https://webonyx.github.io/graphql-php/data-fetching/#solving-n1-problem

But indeed rewriting to JS is a good option if we do not plan to integrate the GraphQL server inside of MediaWiki. If not, the PHP library we use actually allows to use promises and does collection for us: https://webonyx.github.io/graphql-php/data-fetching/#async-php

If you start working significantly on the GraphQL endpoint we should probably move it out of tptools to a separated shared project.

Well that's fascinating, I'm glad the library provides a way to do that. That's something to keep in mind.

I've thought about this for a while and what the final product would look like.

It looks like we have three options:

  1. MediaWiki extension that would hook into Wikibase (like Special:EntityData).
  2. External Service connected to Wikibase (like SPARQL).
  3. Centralized service connected to Wikimedia

I was thinking about T173214#4511629 and the idea of querying other wikis and I wondered why you wouldn't be able to query other wikis (REST API, Action API, etc.) on a single GraphQL server from the top level or deeply nested. I was also thinking about SDC General, would we setup a new instance or would you able to query both from a single instance? Indeed, it seems that the more APIs you could query from a single endpoint, the better. With that, I think option 3, is the best option for the concept of being able to leverage all of the data that Wikimedia has to offer in an easily consumable fashion.

This will shift some things, for instance in the top level, you'd need some way to specify the wiki you want to query first (perhaps the code and the lang as an argument?). But it also let's us do some interesting things, like getting wikidata entities from wikipedia article titles and then returning data about those entities and then data from other linked wikis (on a single request).

With that.. I think it would be best if we moved your code to a new "graphql" repo, host the GraphQL server on a Cloud VPS, and then rewrite it in JavaScript (node.js) so it's as fast as it can be. If it gets enough usage we can talk about moving it onto production. :) What do you think? does that sound like a plan?

With that.. I think it would be best if we moved your code to a new "graphql" repo, host the GraphQL server on a Cloud VPS, and then rewrite it in JavaScript (node.js) so it's as fast as it can be. If it gets enough usage we can talk about moving it onto production. :) What do you think? does that sound like a plan?

It sounds like a very good plan! Having a single GraphQl endpoint for the Wikimedia world is a great idea! Please ping me when you have setup the new JS repository.

dbarratt renamed this task from Support GraphQL Queries on Wikidata to Support GraphQL Queries across Wikimedia.Aug 23 2018, 4:46 AM

I haven't added Wikidata back yet, but this is what I have so far:
https://graphql.wmflabs.org/

Change 460546 had a related patch set uploaded (by Dbarratt; owner: Dbarratt):
[labs/tools/graphql@master] Add a language fallback system

https://gerrit.wikimedia.org/r/460546

Change 460546 merged by Dbarratt:
[labs/tools/graphql@master] Add a language fallback system

https://gerrit.wikimedia.org/r/460546

dbarratt claimed this task.

I'm going to mark this task as resolved since this now has its own project board here: GraphQL

Please file any issues there!

I just had a quick review of the current GraphQL structure for Wikibase entities. It looks great! Thank you!

  • I would switch the Entity type to an interface and have Item, Property, Lexeme... implementations
  • type, datatype and snaktype fields values should probably be enumerations to be more GraphQLish
  • I would merge the EntityLabel and SnakValueMonolingualTextValue types because they represent the same object according to the Wikibase DataModel.
  • I would remove the SnakValue* types (SnakValueString, SnakValueEntity...) because they only provide the "type" field that seems not needed and the feature it covers is already done by the __type introspection field
  • I would rename the Claim type to Statement. In the Wikibase DataModel a "claim" is an affirmation (i.e. a main snak and some qualifiers) and the (claim, references, rank) structure is a "statement"
  • I would make Snakan interface and have implementations for the different snak types

I just had a quick review of the current GraphQL structure for Wikibase entities. It looks great! Thank you!

No problem. I'm learning a lot about GraphQL (and the Action API) so it's been fun. :)

  • I would switch the Entity type to an interface and have Item, Property, Lexeme... implementations

I went back and forth on this a whole lot. I started out with it just as you described, and I abandoned it. At a certain level, I completely agree this is the way it should be. The entities are different types, so GraphQL should expose it the same way. On the other side of this... it is a dark rabbit hole of despair. We have different types of things all over the place. We have different types of pages, we have different types of images, we have different type of blocks, we even have different types of sites (some have Wikibase, some do not).

Even within entity, it's not clear. You've listed the types of entities, but those are specific to Wikidata. Commons for instance will have a Media type which is M. Should we include that?

Eventually I came to this "rule of the road"

Nullable fields do not justify new types.

What I mean by this, is that if the only difference between a two types is that one has a field and one does not, that does not mean that they should become new types. Actually avoiding new types seems to be way easier (from a development side).

However, I realize that this is probably not ideal. Indeed, the typing can be used for conditions that I hadn't really thought about. For instance, MediaWiki has different image types DRAWING and BITMAP are examples. I implemented the method to get thumbnail images, but without the typing, it makes requests for thumbnail images even if you don't want/need them (for instance, on DRAWING).

I also noticed that the Action API uses a lot of enums for values... should the server maintain a list of these values?

Having robust typing, that is aware of the domain knowledge of the wiki (which is completely configurable), seems like way outside of the scope of the server. For one thing, half of things you'd need to know (like which types of entities exist on a wiki and which fields they have) do not even have API endpoints that you can lookup this information, it only exists in code or in configuration.

With all of that... I think GraphQL requests should be delegated to their individual sites T209133. This way each site runs it's own GraphQL server (as a MediaWiki extension) and can be aware of it's own extensions, and allow each extension to implement it's own typings. That way, the decisions of what types should exist or not exist is delegated to the extensions themselves rather than trying to make that decision for everyone.

  • type, datatype and snaktype fields values should probably be enumerations to be more GraphQLish

I'm not sure what that means? I'm just returning what comes back from the Action API, but yeah if we implemented different types that field would no longer be needed.

  • I would merge the EntityLabel and SnakValueMonolingualTextValue types because they represent the same object according to the Wikibase DataModel.

That's a great idea. T209439

  • I would remove the SnakValue* types (SnakValueString, SnakValueEntity...) because they only provide the "type" field that seems not needed and the feature it covers is already done by the __type introspection field

They actually implement a value key:

type SnakValueString implements SnakValue {
  value: String
  # SnakValue
  type: String
}
type SnakValueEntity implements SnakValue {
  value: Entity
  # SnakValue
  type: String
}

This is actually one of the reasons I'm not a huge fan of having multiple types, it's not obvious that you need to specify the types. I had the same problem in T173214#4442448 haha. :)

I would do something like this:

type SnakValue {
  value: Entity | String
  type: String
}

but GraphQL does not allow a union on a scalar. :( and I wanted it to be as close to the existing Action API as possible.

Ironically, this is the place where I did implement different types (out of necessity) and both of us found it more difficult to use. :/ I can't imagine how hard it would be to use if there were different types everywhere, but then again if you expect it maybe it's not that bad?

  • I would rename the Claim type to Statement. In the Wikibase DataModel a "claim" is an affirmation (i.e. a main snak and some qualifiers) and the (claim, references, rank) structure is a "statement"

I decided that claims was more apporiate for two reasons.

  1. Wikibase uses claims in it's API response and I wanted to stay as close to that as possible. https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q190050&props=claims&formatversion=2
  2. Wikibase exposes a getclaims action to get the claims: https://www.wikidata.org/w/api.php?action=wbgetclaims&entity=Q190050&formatversion=2

From the API response, it actually appears that a Statement is a type of Claim. So really a Claim should be an interface, but the field should remain claims because I assume that Wikibase would allow some other form of claims (that perhaps Wikidata does not use?)

Regardless, this is probably further proof that the typings (or lack there of) should be handled by MediaWiki-extensions-WikibaseRepository. I don't really want to be the one making this decision, I would much rather defer it to that extension.

  • I would make Snakan interface and have implementations for the different snak types

I don't disagree. I noticed that's how you had it before, but it violated my rule so I didn't implement it, but again, I think that it should be in the extension rather than a separate service. So perhaps T209133 is more important than I originally thought (even though, it might be an uphill battle to get it onto production and enabled on every wiki).

I also noticed that the Action API uses a lot of enums for values... should the server maintain a list of these values?

You could also fetch them from the paraminfo API module if you don't mind the latency.