Figure out a way for WDQS example parsing not rely on parsoid
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Yurik
	Aug 27 2017, 9:39 PM

Description

Summary:
Originally, the query service UI loaded the example via Parsoid, which meant that examples could not be loaded from wikis without a VisualEditor/Parsoid set up. This severely limited the usefulness of the query service on third-party installations.

This was ultimately resolved in I46420935e5. The following approaches were explored:

The original approach, to ask Parsoid for the HTML of the page (which, unlike the legacy Parser’s output, includes annotations for each template invocation and argument), and extract the query arguments of all SPARQL and SPARQL2 transclusions. Doesn’t work for most third-party installs.
Parse wikitext, looking for {{SPARQL}} and {{SPARQL2}} transclusions and their preceding headings. Fragile.
Model query examples as structured data (e. g. in Wikidata statements). Didn’t go anywhere.
Use the parse tree of the wikitext. Looked promising, ultimately wasn’t implemented.
Use the parsed HTML from the legacy parser, extracting the contents of any <syntaxhighlight> block and finding the preceding headings similarly to the Parsoid version. Implemented and deployed.

Details

	Subject	Repo	Branch	Lines +/-
	Load example queries from parsed wikitext	wikidata/query/gui	master	+43 -35
	WIP: optionally load query examples from action API	wikidata/query/gui	master	+55 -20

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Open	None	T192907 Default query examples for Wikibase-specific WDQS
Open	None	T179262 wikibase docker container wdqs-frontend The example button leads to the examples from Wikidata
Resolved	Lucas_Werkmeister_WMDE	T174298 Figure out a way for WDQS example parsing not rely on parsoid

Event Timeline

Yurik created this task.Aug 27 2017, 9:39 PM

Restricted Application added projects: Wikidata, Discovery-ARCHIVED. · View Herald TranscriptAug 27 2017, 9:39 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Smalyshev moved this task from Incoming to GUI on the Wikidata-Query-Service board.Aug 28 2017, 11:57 PM

Lucas_Werkmeister_WMDE added a parent task: T179262: wikibase docker container wdqs-frontend The example button leads to the examples from Wikidata.Oct 30 2017, 2:15 PM

Addshore awarded a token.Oct 30 2017, 2:16 PM

Addshore subscribed.

I have already implemented this as part of my own override:
https://github.com/nyurik/wikidata-query-gui/blob/master/wikibase/queryService/api/QuerySamples.js#L171

It looks like you’re using the visualeditor API action, so doesn’t this still require parsoid to be setup?

And requires VisualEditor which wouldn't really help with T179262

@Lucas_Werkmeister_WMDE I don't think it needs parsoid - OSM wiki doesn't have it as far as I can see, and this approach works there. @Addshore correct, this approach does require visual editor extension. I wonder if it would be possible to use action=parse instead.

action=parse looks pretty good, though I suppose we still want to use the REST API if available for improved caching behavior.

In T174298#3729013, @Lucas_Werkmeister_WMDE wrote:

action=parse looks pretty good, though I suppose we still want to use the REST API if available for improved caching behavior.

Sounds good!

Hm, but it looks like the data-mw attributes we use to extract the query from the template are Parsoid-specific again :(

Change 388035 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[wikidata/query/gui@master] WIP: optionally load query examples from action API

https://gerrit.wikimedia.org/r/388035

gerritbot added a project: Patch-For-Review.Nov 2 2017, 11:51 AM

Lucas_Werkmeister_WMDE mentioned this in rWDQG6e05dcf77f6f: WIP: optionally load query examples from action API.Nov 2 2017, 11:53 AM

I think we might have to fall back to parsing wikitext – look for {{SPARQL}} and {{SPARQL2}} templates and the preceding === headings.

@Lucas_Werkmeister_WMDE manually doing regex-style parsing of Wiki markup in JavaScript is a guaranteed path to hell. Trust me on this one :) CCing @Anomie - is there an easy api way to get resolved template parameters on a wiki page via a GET request?
UPDATE: @Anomie, the goal is to parse this page to get the headers and the query= parameter for each SPARQL query.

manually doing regex-style parsing of Wiki markup in JavaScript is a guaranteed path to hell

True in general, but given we are talking about one (ok, two maybe) template with known content on page that we can exercise a measure of control over it - maybe it's still possible?
Theoretically, we could go as far as requre special markup (like translations do) for this particular page, if it's too hard to find templates - but I don't think it's really that hard, is it?

@Smalyshev I suspect it will be relatively easy to do with the standard API - and if so, why not reuse the existing functionality? POST is a very small price to pay for this (think how often this feature is used - not worth creating a special parser just to avoid a few CPU cycles)

manually doing regex-style parsing of Wiki markup in JavaScript is a guaranteed path to hell. Trust me on this one :)

Yes, I guessed as much :) but we would only be doing that for custom installs anyways (I’d definitely stick to Parsoid for wikidata.org).

Is this the point where my workaround becomes a feature?
Maybe this is a chance to get rid of the Wikitext parsing!

We have a property on Wikidata, but unfortunately we also have a very small size limit.
I think it would be really cool to use SPARQL for querying and federating examples.

Maybe we can share some ideas about different approaches.
@Yurik you can explain more about your use case and constraints.

@daniel pointed out that we can use action=parse&prop=parsetree. This returns an XML tree like this:

<!-- ... -->
<h level="3" i="3">=== <translate><comment><!--T:11--></comment> Cats</translate> ===</h>
\n
<template lineStart="1">
<title>SPARQL2</title>
<part>
<name>query</name>
<equals>=</equals>
<value>SELECT ?item ?itemLabel \nWHERE \n{\n  ?item wdt:P31 wd:Q146.\n  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }\n}\n</value>
</part>
</template>
\n\n

This already contains all the templates with parameters, which is mostly the same as what Parsoid gives us. We would still need to do a bit of parsing, but nothing worse than what we already do to extract the adjacent title of a query example.

In T174298#3734602, @Yurik wrote:

@Anomie - is there an easy api way to get resolved template parameters on a wiki page via a GET request?

No. That would involve some particularly deep diving into the parser internals.

You can use action=parse&prop=parsetree to get the wikitext annotated with XML-style tags setting off the templates and their unresolved parameters' wikitext. I suppose you might then pass that wikitext back into action=parse or action=expandtemplates to "resolve" it, if necessary.

You might also be able to use TemplateSandbox to replace the actual template with something that just prints the parameter out in a machine-readable format. I note the specific template you're looking at there seems to already do that, embedding the parameter into a link.

Your best bet for a general parsing is to use something like mwparserfromhell; I don't know if there's a JavaScript version of something like that.

TemplateSandbox doesn’t seem to be part of a default MediaWiki installation, so it wouldn’t help in @Addshore’s case. I think the parse tree is our best bet for now – there shouldn’t be any nested templates (other than the {{!}} workaround for | in queries), so I hope another action=parse or something won’t be necessary.

Smalyshev mentioned this in rWDQG15638080e9c0: WIP: optionally load query examples from action API.Feb 22 2018, 4:43 AM

Smalyshev edited projects, added Wikidata Query UI; removed Wikidata-Query-Service.Apr 26 2018, 5:44 PM

Addshore triaged this task as Medium priority.Jun 26 2018, 3:57 PM

Lucas_Werkmeister_WMDE mentioned this in T223586: configure Factgrid Query Service UI to use local example queries.Jun 25 2019, 9:37 AM

Change 388035 abandoned by Lucas Werkmeister (WMDE):
WIP: optionally load query examples from action API

Reason:
– this now has merge conflicts in all of the files it touches, and can’t work without a lot more changes, as detailed in the linked tasks.

https://gerrit.wikimedia.org/r/388035

Change 548490 had a related patch set uploaded (by Lucas Werkmeister (WMDE); owner: Lucas Werkmeister (WMDE)):
[wikidata/query/gui@master] Load example queries from parsed wikitext

https://gerrit.wikimedia.org/r/548490

Change 548490 merged by jenkins-bot:
[wikidata/query/gui@master] Load example queries from parsed wikitext

https://gerrit.wikimedia.org/r/548490

Announcement made on the example queries talk page, on project chat, and with a different and more 3rd-parties-oriented version on wikidata-tech ML and Wikibase ML.

Lucas_Werkmeister_WMDE removed a project: Patch-For-Review.Dec 4 2019, 3:44 PM

Lucas_Werkmeister_WMDE updated the task description. (Show Details)

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptDec 4 2019, 3:44 PM

Lucas_Werkmeister_WMDE updated the task description. (Show Details)Dec 4 2019, 3:44 PM

Lucas_Werkmeister_WMDE updated the task description. (Show Details)

@Lucas_Werkmeister_WMDE In my opinion we can probably close this task now?
I for one am already using this code / feature in the wild.
Thoughts?

I think so, yeah.

Aklapper removed subscribers: • Jonas, Anomie.Oct 16 2020, 5:42 PM

Aklapper removed a subscriber: Wikidata-Query-Service.May 24 2023, 1:22 PM

Figure out a way for WDQS example parsing not rely on parsoidClosed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Figure out a way for WDQS example parsing not rely on parsoid
Closed, ResolvedPublic
Actions

Related Objects
Search...