Page MenuHomePhabricator

Build look up items by external ID in GraphQL
Open, Needs TriagePublic5 Estimated Story Points

Description

We would like to allow looking up items i.e., request a single item or nothing (in contrast to searching) by external identifiers. In case multiple items match the property + external ID pair, a conflict object is returned containing only the matching Item IDs.

Acceptance criteria:

  • The field name would be "itemByExternalId" and would take the external id and its corresponding property id as input

Schema:

type Query {
	# ...
	itemByExternalId(property: PropertyId!, externalId: String!): ItemByExternalIdResult
}

union ItemByExternalIdResult = Item | ExternalIdNonUnique

type ExternalIdNonUnique {
	items: [ItemId!]!
}

Task breakdown notes:

  • create a separate use case that performs the lookup
  • the lookup resolver will call the ItemResolver

Open questions for Product:

  • How complex is this lookup? (i.e. how many lookups are allowed in a single query?)

Event Timeline

WMDE-leszek set the point value for this task to 5.Thu, Mar 12, 11:27 AM

@ECohen_WMDE and I think a slightly different version of option 3 is best - so return all results with a warning saying "The external ID is in use by multiple items"

is that do-able?

@ECohen_WMDE and I think a slightly different version of option 3 is best - so return all results with a warning saying "The external ID is in use by multiple items"

is that do-able?

It is doable, but to be honest returning all results is the worst option in my opinion. It would force us to make itemByExternalId return a list, in which case the name probably needs to change too, because "item (singular) by external id" returning a list looks silly. It also makes the complexity of the query less predictable because instead of allowing the user to query all fields of a single item, we're now querying all fields of potentially multiple items.

I thought of a fourth option which is a bit of a compromise between all those options and my personal favorite now. We can make the field return either the full item when there is a unique result, or a "conflict result" containing only the IDs of the identified items. In the schema, it would look like this:

type Query {
    ...
    itemByExternalId(property: ID!, value: String!): ItemByExternalIdResult
  }

  union ItemByExternalIdResult = Item | ExternalIdConflict

  type ExternalIdConflict {
    items: [ItemId!]!
  }

The only small downside of this approach is that queries become slightly more complex for users, but I think this is still pretty standard GraphQL stuff. They would have to handle the two cases separately, or choose not to care about the conflict case:

{
  itemByExternalId(property: "P345", value: "abc123") {
    ... on Item {
      id
      label(languageCode: "en")
    }
    # the part below is optional
    ... on ExternalIdConflict {
      items # these are only IDs
    }
  }
}

I also looked into the numbers a bit. There are

  • 10058 external ID properties (query)
  • 263,133,470 external ID values (query)
  • 262,488,083 distinct external ID property value pairs (query)

So we're talking about ~0.25% duplicates.

Change #1259986 had a related patch set uploaded (by Dima koushha; author: Dima koushha):

[mediawiki/extensions/Wikibase@master] GQL: Add itemByExternalId use case

https://gerrit.wikimedia.org/r/1259986

Change #1259986 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] GQL: Add itemByExternalId use case

https://gerrit.wikimedia.org/r/1259986

Change #1262090 had a related patch set uploaded (by Dima koushha; author: Dima koushha):

[mediawiki/extensions/Wikibase@master] GQL: Check search availability in itemByExternalId resolver

https://gerrit.wikimedia.org/r/1262090

Change #1260740 had a related patch set uploaded (by Dima koushha; author: Dima koushha):

[mediawiki/extensions/Wikibase@master] GQL: Add validation for itemByExternalId lookup

https://gerrit.wikimedia.org/r/1260740

Change #1262090 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] GQL: Check search availability in itemByExternalId resolver

https://gerrit.wikimedia.org/r/1262090

Change #1260740 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] GQL: Add validation for itemByExternalId lookup

https://gerrit.wikimedia.org/r/1260740

This seems to work as expected. Scenarios tested:

  • getting an item by external ID property
  • attempting to query with a property of wrong type (not external ID property)
  • attempting to query by a non existing external ID value

I've tested this on Wikidata and I don't know what external IDs might be duplicated for multiple items there. Testing on beta I didn't do as my IP is apparently blocked there.

good stuff, thank you!

Thanks to @kimpham and @Jakob_WMDE I was able to test the duplicate item with the same external ID with

{
  itemByExternalId(property:"P212", externalId: "978-3-440-09723-6") {
    ... on ExternalIdNonUnique { items }
  }
}

All looks good, thank you!