Page MenuHomePhabricator

Include labels inside the statements alongside QIDs
Open, Needs TriagePublicFeature

Description

Feature summary
Provide labels for each statement's property & value, as a string (not just QID), inside the JSONs returned for a each item or property get request.

q42_json_with_qids.png (960×1 px, 101 KB)

Currently, to enable the REST API to list all of the statement's property & value as strings (not QIDs/PIDs) for a single WD-item, users must get request the original item json, then for loop over the statements and make a new get request for each property ID and for each (wikibase-item) statement to populate the VALUE of the tuples (ITEM, PROPERTY, VALUE).

Use case(s)
Avoiding the need for batch get requests "just to" populate the statements as human readable strings.

Use case: provide a full list of information as a string statement about a single WD-item in one REST API get request.

Use Case: Produce a list of wikidata statements as string text to inject into an LLM.
This currently requires

  1. downloading the item json
  2. for loop over each statement
  3. download the property json
  4. store the property label
  5. download the value (QID) jsons
  6. store the value labels.

if the RESTAPI included the labels in the JSON for the statements, then the user could bypass steps 2 - 6 in the instruction list above (only needing to download one JSON).

Benefits
The benefit of this feature would be to reduce the need for the user to make 300+ get requests to populate the statement's property & value content labels for a single WD-item (with 300+ statements).

The process currently takes approximately 33 seconds for 268 statements. If the REST API populated the labels alongside the QIDs in the statements, then the process would take 0.12 seconds or 260x faster for this use case.

Event Timeline

jdfraine updated the task description. (Show Details)

Change #1028517 had a related patch set uploaded (by Silvan Heintze; author: Silvan Heintze):

[mediawiki/extensions/Wikibase@master] REST: Introduce a label service

https://gerrit.wikimedia.org/r/1028517

Change #1028518 had a related patch set uploaded (by Silvan Heintze; author: Silvan Heintze):

[mediawiki/extensions/Wikibase@master] REST: Use label service in GetItemStatements

https://gerrit.wikimedia.org/r/1028518

Hi @jdfraine, just wanted to send you a quick message saying that we had a little discussion and unfortunately this won't be done anytime soon. If there might be more of a need for this (and similar use cases for ML work) longer term then we might as well discuss a new API instead of building on top of the current REST API

Happy to discuss if you think that makes sense

Hello @Ifrahkhanyaree_WMDE, WPP, et al.

Thank you for taking the time to consider this suggestion. I understand the need to prioritise resources and align input with product vision, strategy, and roadmaps.

I wanted to provide one possible motivation for reconsidering this feature suggestion in future developments.
[tl;dr]: including the GetItemStatements functionality into the REST API would more significantly reduce the need for users to overwhelm the WDQS and more clearly distinguish the functionality of the ActionAPI and the REST API.

Background Assumptions: I may be missing context and am only focusing on search & re-user use cases.

The WD data access options:

  • The REST API provides full item data but does not provide search capabilities like the Action API and SPARQL endpoints.
  • The SPARQL endpoint does not provide full item data by default but does provide search and human-readable output functionalities, including NLP-required information.
    • With query optimisation, the WDQS(SPARQL) can provide complete knowledge per item.
    • SPARQL query optimisation requires extensive user learning on the order of a graduate-level course in graph data engineering.
  • The ActionAPI provides search capabilities that differ from the WDQS(SPARQL) endpoint and can provide full item data.
    • With query optimisation and significant user learning, the ActionAPI can provide the same data as the REST API.

Reviewing the API as an ecosystem for NLP use cases:

  • The Action API provides JSON-formatted data but not human-readable information without significant data use and many requests.
  • The WDQS (SPARQL) API can provide human-readable information with modifications to the SPARQL query, but acquiring full item data from the WDQS is challenging.
    • Moreover, it is incredibly difficult to use the SPARQL API itself (i.e., the SPARQL learning curve).
  • The REST API seems to serve a similar use case to the Action API, except that it operates within the REST framework and provides full item data or label/language/desc/etc by default.

My biased claim: Within the data access use case, if the user is savvy and does not require REST, then the Action API can fulfil the functionality of the REST API.

Providing NLP Required Human-Readable Claims via the REST API
I believe—with transparently self-serving interest—that providing the complete human-readable information required for NLP applications would add significant value to the REST API functionality. It would distinguish REST API more clearly from the Action API functionality. Moreover, an NLP functional REST API could become the primary data access method to provide complete knowledge per item (i.e., human-readable claim statements) equivalently to the WDQS(SPARQL) API.

Conclusion
In this ecosystem perspective, the WDQS (SPARQL) can provide explicit semantic knowledge, the ActionAPI can provide search capabilities across items, and the NLP-enabled REST API can provide complete information for NLP and human-readable data per item.

As such, the combination of the ActionAPI search and an NLP-enabled REST API would significantly reduce the need for NLP re-users to overwhelm the WDQS.

The GetItemStatements functionality (see above) would:

  • Enable the REST API to serve use cases that the Action API cannot
  • Avoid requiring many 100s of get requests or batch operations per NLP use cases
    • Provide NLP-required, human-readable data in one get request.
  • Reduce the computational time and network activity by the WD servers
  • Avoid NLP use cases overloading the SPARQL endpoint
  • More clearly distinguishing the REST API from the capabilities of the ActionAPI.

Thank you again for providing this space to consider my suggestion.

Change #1028517 abandoned by Silvan Heintze:

[mediawiki/extensions/Wikibase@master] REST: Introduce a label service

Reason:

this is better solved with a GraphQL endpoint

https://gerrit.wikimedia.org/r/1028517

Change #1028518 abandoned by Silvan Heintze:

[mediawiki/extensions/Wikibase@master] REST: Use label service in GetItemStatements

Reason:

this is better solved with a GraphQL endpoint

https://gerrit.wikimedia.org/r/1028518