Page MenuHomePhabricator

Display labels for entity values at result visualisation step in Wikibase Query Engine
Open, LowPublic

Description

Replace wd:Q255 and the like with their label according to the user language is the query service result display for entities datatypes.

Rationale:
Labels for item or other entities can be computed through the « wikibase:label » service. This is useful, but causes major performance issues in queries as some queries timeout with the usage of this service while taking a handful of seconds with the query service call commented.

An idea to solve this is to deal with the computing of the label not at the query time with wikibase:label but at the result display time of the result set. This could be way more efficient in some cases, for example when the result set is small but the service is actually called a lot during the query, or when the result set is big but the user actually watch a tiny fraction of it, especially when the ?xLabel variables generated by the service are used only in the « select … { » projection part of the query and never in the « { … } » part.

The case where only a tiny fraction of the result is actually watched by a human may be pretty common as the results in the table view are paged and the full result set may actually not be use to be watch fully by a human but its destiny is actually to be used by an external tool.

Event Timeline

TomT0m created this task.Feb 14 2018, 12:59 PM
Restricted Application added projects: Wikidata, Discovery. · View Herald TranscriptFeb 14 2018, 12:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I’m not convinced moving labels to the query service UI is the right solution here… most of the problems you describe can be solved instead by optimizing the query, using subqueries to only run the label service very late in the query execution. We even have a task for automating that optimization, T166139.

I think the simpler the writing of the query is, the better for the user. As a user, I’d prefer not having to care about wikibase:label at all, having a label for a human is such an important and basic thing … Most of the time it’s just annoying to have to add this to a query.

Automatically wrapping the query to solve a problem of performance of a service I’d prefer not having to care about seems like a weird complication to me (almost like a Goldberg machine :). But I don’t care that much and you’re the boss :) I guess the service is needed in some cases anyway and you’ll have to solve the performance issues in these cases as well. My personal opinion however is that it’s moistly useful when we need to use the label variables for some reason to filter the results of the query, and in that case the optimization of wrapping the query will not work as the label has to be computed in any case.

T166139: Query optimizer for labels should be using sub queries seems to be pretty hard to do in generic case (that is to say I currently have no good idea how to do it, suggestions welcome). It is true that many of these issues can be solved with proper query writing, but making Blazegraph Optimizer do it is non-trivial, as it seems. And making users do it doesn't always work, as many users are not advanced enough in their knowledge of SPARQL and implementation details. OTOH, for small query results, fetching the labels client-side may be a viable solution. Labels also should be highly cacheable in common use case (though cache hit rate may not be very good). It seems to me like a viable idea at least to try out. Of course, we should lazy-load the labels and batch the loads.

Smalyshev triaged this task as Low priority.May 2 2019, 6:59 AM