Page MenuHomePhabricator

Allow federated queries with the Lingua Libre SPARQL endpoint
Closed, ResolvedPublic1 Estimated Story PointsSecurity

Description

I would like to be able to query Lingua Libre's SPARQL endpoint from the Commons query service. Lingua Libre's recordings are uploaded to Commons and I would like to be able to compare the data Lingua Libre has about the files with the structured data on Commons.

The URL for the endpoint is https://lingualibre.org/sparql (it redirects internally to https://lingualibre.org/bigdata/namespace/wdq/sparql).

Event Timeline

Lingua Libre is not currently a protected service. We have contacted the service owner about this issue, and are waiting to hear back before moving forward with this task.

@dcausse actually did the contacting and would know more

@WikiLucas00 I contacted @VIGNERON who I had been in contact in the past about lingualibre, I just heard back from him.

dcausse set Security to Software security bug.EditedJun 11 2021, 7:08 AM
dcausse added projects: Security, Security-Team.
dcausse changed the visibility from "Public (No Login Required)" to "Custom Policy".
dcausse changed the subtype of this task from "Task" to "Security Issue".

Securing this ticket while this gets sorted out as this could cause serious harm to the lingualibre sparql endpoint

Fixed for the security issue: I added proxy_set_header X-BIGDATA-READ-ONLY "yes"; in nginx configuration, double-checked Blazegraph listen in 127.0.0.1 (was ok), documented it for future reinstallations, and deleted David’s test (I don’t think it worth to rebuid Blazegraph database to clean potential undesired data but I can if decided otherwise).

@Seb35 thanks! I'll proceed with this task.
I think this task can be made public again (but I don't think I have the rights to do so).

Indeed, this task can become public. @Aklapper: could you remove the protection of this task?

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".Jun 11 2021, 2:57 PM
sbassett subscribed.

Indeed, this task can become public. @Aklapper: could you remove the protection of this task?

Done.

A more stable endpoint URL could be added, for instance https://lingualibre.org/sparql to mimic Wikidata configuration. Do I add such URL in addition of the current one?

@Seb35 this would make a lot of sense, please update the task description if you do so.

@VIGNERON @WikiLucas00 @Nikki : I add https://lingualibre.org/sparql as possible URL for SPARQL endpoint to ease its use (similar to Wikidata).

Added. The timeout is 60 seconds; if needed it can be increased independently of the timeout of the Blazegraph interface.

Change 699746 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/deploy@master] Add lingualibre sparql endpoint to allowed federated endpoint

https://gerrit.wikimedia.org/r/699746

@Seb35 I don't think https://lingualibre.org/sparql is reachable, at least for me.

Seems to work for me: P16666

Change 699746 merged by DCausse:

[wikidata/query/deploy@master] Add lingualibre to the allowed list federated sparql endpoints

https://gerrit.wikimedia.org/r/699746

Deployed and available on https://wcqs-beta.wmflabs.org/ via SERVICE <https://lingualibre.org/sparql>, will be available on wdqs after the next deploy (probably next monday).

Deployed and available on https://wcqs-beta.wmflabs.org/ via SERVICE <https://lingualibre.org/sparql>, will be available on wdqs after the next deploy (probably next monday).

Hello @dcausse, is it available on wikidata query service now?
@Lepticed7 and @VIGNERON crafted a request to have speakers of Lingua Libre displayed on a map. The request works on WCQS but only with a limit of 100 (out of 322 locations). I wondered if that came from the fact that WCQS is in Beta for the moment, and if the result would be different if the query was made from WDQS.
All the best

Hello, the request exposed by WikiLucas00 has been optimised and has no problems anymore.

@WikiLucas00 my apologies, I completely missed your ping, yes lingualibre can be queried directly from query.wikidata.org. Regarding your second question, sadly it is unlikely that the performances of a single query will be better once wcqs is running on the production environment.