Page MenuHomePhabricator

[timebox 8hrs] QueryService UI: Graph Builder result rendering option queries Wikidata instead of WBStack
Closed, ResolvedPublicBUG REPORT

Description

(Imported from GitHub issues)


GreenReaper:
When I ran this query in Firefox Nightly and selected Graph Builder from the 👁 display dropdown at the lower left, I got a "polestar" embedded UI which showed only #Count as a selectable field.

On reviewing this in Developer Tools, it is performing the desired query against query.wikidata.org, not the local endpoint.

This may be because within https://furry.wiki.opencura.com/query/js/wdqs.min.551842a5a1d280d61114.js is this fragment:

wikibase.queryService.ui.resultBrowser.PolestarResultBrowser = function (i, a, e) {
  'use strict';
  function t() {
  }
  return (t.prototype = new wikibase.queryService.ui.resultBrowser.AbstractResultBrowser).draw = function (e) {
    var t = {
      url: 'wikidatasparql://query.wikidata.org/?query=' + location.hash.substr(1),
      name: 'Imported from Wikidata Query Service',
      _directEmbed: !0
    },

This looks like it's the GRAPH_QUERY_PREFIX mentioned in wikibase/queryService/ui/resultBrowser/PolestarResultBrowser.js#L10. See also how 'wikidatasparql' is processed.

I tried changing the embed to use the local domain and I got Error: URL hostname is not whitelisted which was from the bundled https://furry.wiki.opencura.com/query/polestar/scripts/vendor.js

This *appears* to be part of TopoJSON and I think it requires that $wgGraphAllowedDomains be set to include an array item containing the domain with the wikidatasparql key, as well as having that in the http (and/or https?) array. This is set initially here from an empty default - see a custom setting for labs. There is some discussion in T145944 which may be helpful (or not...).

I tried debugging this a bit further by setting a breakpoint in sanitizeUrl() -> sanitizeHost() and found it using this as domains[]:

geoshape: Array [ "maps.wikimedia.org" ]
​http: Array [ "wmflabs.org" ]
​https: Array(12) [ "mediawiki.org", "wikibooks.org", "wikidata.org", "wikimedia.org", "wikimediafoundation.org", "wikinews.org", "wikipedia.org", "wikiquote.org", "wikisource.org", "wikiversity.org", "wikivoyage.org", "wiktionary.org" ]
​wikidatasparql: Array [ "query.wikidata.org", "wdqs-test.wmflabs.org" ]
​wikirawupload: Array [ "upload.wikimedia.org", "upload.beta.wmflabs.org" ]

I added the following at that point:

this['domains']['https'].push('furry.wiki.opencura.com');
this['domains']['wikidatasparql'].push('furry.wiki.opencura.com');

but I got instead

Error: wikidatasparql:: URL must either be relative (wikidatasparql:///...), or use one of the allowed hosts:

When I tried another way it transformed it to

https://furry.wiki.opencura.com/bigdata/namespace/wdq/sparql?query=

when I think it needed to be

https://furry.wiki.opencura.com/query/sparql?query=

because of a line somewhere that was setting this:

.pathname = '/bigdata/namespace/wdq/sparql',

I stopped at that bit, having gone too far down the rabbit hole for a feature that isn't exactly core - also, my netbook can't handle the stress of stepping through a giant vendor.js - but I thought I should mention it, as it seems to not be working as intended.

GreenReaper:
On-wiki Graph extension SPARQL data use is also impacted, as shown in the console for this query using relative addressing:

[Vega Err] PARSE DATA FAILED: values Error: URL hostname is not whitelisted: wikidatasparql:///?query=PREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX+lwb%3A+%3Chttp%3A%2F%2Fdata.lexbib.org%2Fentity%2F%3E%0APREFIX+ldp%3A+%3Chttp%3A%2F%2Fdata.lexbib.org%2Fprop%2Fdirect%2F%3E%0APREFIX+lp%3A+%3Chttp%3A%2F%2Fdata.lexbib.org%2Fprop%2F%3E%0APREFIX+lps%3A+%3Chttp%3A%2F%2Fdata.lexbib.org%2Fprop%2Fstatement%2F%3E%0APREFIX+lpq%3A+%3Chttp%3A%2F%2Fdata.lexbib.org%2Fprop%2Fqualifier%2F%3E%0A%0Aselect+%3Fs+%3FsLabel+%0A%3Frgb%0A%23%3FedgeLabel+%0A%3Fo+%3FoLabel+where+%7B%0A++%3Fs+ldp%3AP5+lwb%3AQ7.%0A++%3Fs+ldp%3AP72*+lwb%3AQ21886+.+%23+root+node+of+the+representation%2C+Q21886+for+%22Lexicography%22%0A++%3Fs+%3Fp+%3Fo+.%0A++%3Fedge+wikibase%3AdirectClaim+%3Fp+.%0A++%3Fs+rdfs%3Alabel+%3FsLabel+.%0A++%3Fedge+rdfs%3Alabel+%3FedgeLabel+.%0A++FILTER+%28strstarts%28str%28%3FedgeLabel%29%2C%22skos%3Abroader%22%29+%7C%7C+strstarts%28str%28%3FedgeLabel%29%2C%22skos%3Arelated%22%29+%7C%7C+strstarts%28str%28%3FedgeLabel%29%2C%22skos%3AcloseMatch%22%29%29%0A++%3Fo+rdfs%3Alabel+%3FoLabel+.%0A++FILTER+%28lang%28%3FoLabel%29%3D%22en%22%29%0A++%7B+select+%3Fs+%28count+%28%3Fbroader%29+as+%3Fdistance%29+where+%7B%0A++%3Fs+ldp%3AP5+lwb%3AQ7.%0A++++OPTIONAL+%7B%0A++%3Fs+ldp%3AP72%2B+%3Fbroader+.+%7D%7D+GROUP+BY+%3Fs+%3Fdistance%0A++%7D%0A++%0A++BIND+%28%0A++COALESCE%28%0A++++IF%28%3Fs+%3D+lwb%3AQ21886+%2C+%220000CC%22%2C+1%2F0%29%2C%0A++++IF%28str%28%3Fdistance%29%3D%221%22+%2C+%22FF9999%22%2C+1%2F0%29%2C%0A++++IF%28str%28%3Fdistance%29%3D%222%22+%2C+%22FFB266%22%2C+1%2F0%29%2C%0A++++IF%28str%28%3Fdistance%29%3D%223%22+%2C+%22FFFF99%22%2C+1%2F0%29%2C%0A++++IF%28str%28%3Fdistance%29%3D%224%22+%2C+%22CCFF99%22%2C+1%2F0%29%2C%0A++++IF%28str%28%3Fdistance%29%3D%225%22+%2C+%22CCFFE5%22%2C+1%2F0%29%2C%0A++++%22FFFFFF%22%0A++%29+AS+%3Frgb%0A%29%0A+++%0A++%7D+GROUP+BY+%0A++++%3Fs+%3FsLabel+%3Fdistance+%3Frgb+%0A++++%23%3FedgeLabel+%0A++++%3Fo+%3FoLabel

The user in question was unable to use the native embed.html on parent domain LexBib.org due to X-Frame-Origin: SAMEORIGIN in query server headers, and my idea was for them to put a graph on the MediaWiki side (which they could embed).

In the end they used a static image of the graph and linked to the query.

Event Timeline

Deniz_WMDE renamed this task from Graph Builder result rendering option queries Wikidata instead of WBStack to QueryService UI: Graph Builder result rendering option queries Wikidata instead of WBStack.Feb 9 2024, 10:42 AM

Hey @GreenReaper, is this a problem that still persists? If yes, could you provide an example query that demonstrates it? The one linked in the ticket doesn't seem to give any results. Thanks!

Tarrow subscribed.

Hi, we're closing this due to inactivity. Please feel free to reopen if this is still an issue

Sorry, I didn't see this - here is an updated query. The base URL had changed, and Wikibase.cloud also uses HTTPS rather than HTTP for its concept URIs.

Tarrow changed the subtype of this task from "Task" to "Bug Report".Jun 26 2024, 11:42 AM
Tarrow renamed this task from QueryService UI: Graph Builder result rendering option queries Wikidata instead of WBStack to [timebox 8hrs] QueryService UI: Graph Builder result rendering option queries Wikidata instead of WBStack.Oct 24 2024, 12:19 PM

So I looked into this issue for over 8 hours and could not find a quick fix.
I created a wiki with the same properties and items as the user just so I could make the same query on the wiki.
I could clearly see that clicking on the query builder does link to wikidata.org but changing the address here links no where and gives this error Error: URL hostname is not whitelisted.
I could not try the $wgGraphAllowedDomainssetting as this require the Graphextension and we recently had it removed.

I don't quite understand this bit - isn't it just a variable that you can set regardless of the amount Graph extension being installed (i.e. they just reused the variable for something else):
I could not try the $wgGraphAllowedDomainssetting as this require the Graphextension and we recently had it removed

One past setting of it was:

$wgGraphAllowedDomains = array(
'https' => array(
'mediawiki.org',
'wikibooks.org',
'wikidata.org',
'wikimedia.org',
'wikimediafoundation.org',
'wikinews.org',
'wikipedia.org',
'wikiquote.org',
'wikisource.org',
'wikiversity.org',
'wikivoyage.org',
'wiktionary.org',
),
'wikirawupload' => array(
'upload.wikimedia.org',
),
'wikidatasparql' => array(
'query.wikidata.org',
)
);

This matches the array found in an earlier comment.

From the small amount of looking I did with Rosalie I believe that worrying about the settings on mediawiki might be rather a red-herring (i.e. a misleading direction to be investigating in). I think it's likely that this array is set to these values because both polestar and Extension:Graph use Vega under the hood. I think these settings were probably deliberately mirrored in the the past but the setting here doesn't seem to be derived from Mediawiki; instead it was selected to be the same by the original developer.

It appears to me this allow list is hardcoded in the polestar javascript. You can see it in https://github.com/wbstack/queryservice-ui/blob/main/polestar/scripts/vendor.js#L7 in this minified JS. I believe changing this would be only sensible to do if we could upstream the change and make it configurable rather than just changing more hardcoded values ourself.

From what I can see this polestar feature hasn't been changed much (at all?) since the commit where this was first built so I can't see the expected process for modifying it. https://gerrit.wikimedia.org/r/c/wikidata/query/gui/+/304413 and it's not at all clear to me where the un-minified source code is.

It's not clear to me how in demand this polestar feature is but I think fixing this would require some proper investment in refactoring / improving the queryservice UI. We should therefore try to prioritise this appropriately alongside other work. I'd suggest for now we put this on the back burner and let @Anton.Kokh determine if it's important enough to devote some serious time to doing properly.

Understandable. Some of the minified code in question seems to come directly from Vega based on the presence of copyright strings and declatations in a similar order.

It may be however that it is drawing on vega-lite given the introduction to Polestar (which itself states that it is out of date - as the linked commit suggests, work on the code seems to have been around August 2016).

Might be worth asking at the federated queries workshop whether anyone actively uses the feature when querying [via] Wikidata. Obviously we can't use it right now on Cloud so that doesn't help identify demand. But I can't say it's something people clamour for, just a feature that doesn't work here.

Yeah, the underlying ui tool seems to be deprecated and unfortunately it's successor is also deprecated (T291928). I wonder if anyone is using it at all on Wikidata; maybe we can find out some numbers somehow

Confirmed in T380018 that feature is still in use by Wikidata.

I suggest we disable this functionality on Wikibase Cloud for now, and use the other ticket to find a solution that will also work for Wikidata post-query-graph-split (and any Wikibase instance, for that matter)

I suggest we disable this functionality on Wikibase Cloud for now, and use the other ticket to find a solution that will also work for Wikidata post-query-graph-split (and any Wikibase instance, for that matter)

Anton.Kokh claimed this task.