Page MenuHomePhabricator

Allow federated queries with the Lingua Libre SPARQL endpoint
Open, Needs TriagePublic1 Estimated Story PointsSecurity

Description

I would like to be able to query Lingua Libre's SPARQL endpoint from the Commons query service. Lingua Libre's recordings are uploaded to Commons and I would like to be able to compare the data Lingua Libre has about the files with the structured data on Commons.

The URL for the endpoint is https://lingualibre.org/sparql (it redirects internally to https://lingualibre.org/bigdata/namespace/wdq/sparql).

Event Timeline

Lingua Libre is not currently a protected service. We have contacted the service owner about this issue, and are waiting to hear back before moving forward with this task.

@dcausse actually did the contacting and would know more

@WikiLucas00 I contacted @VIGNERON who I had been in contact in the past about lingualibre, I just heard back from him.

dcausse set Security to Software security bug.EditedFri, Jun 11, 7:08 AM
dcausse added projects: Security, Security-Team.
dcausse changed the visibility from "Public (No Login Required)" to "Custom Policy".
dcausse changed the subtype of this task from "Task" to "Security Issue".

Securing this ticket while this gets sorted out as this could cause serious harm to the lingualibre sparql endpoint

Fixed for the security issue: I added proxy_set_header X-BIGDATA-READ-ONLY "yes"; in nginx configuration, double-checked Blazegraph listen in 127.0.0.1 (was ok), documented it for future reinstallations, and deleted David’s test (I don’t think it worth to rebuid Blazegraph database to clean potential undesired data but I can if decided otherwise).

@Seb35 thanks! I'll proceed with this task.
I think this task can be made public again (but I don't think I have the rights to do so).

Indeed, this task can become public. @Aklapper: could you remove the protection of this task?

sbassett changed the visibility from "Custom Policy" to "Public (No Login Required)".Fri, Jun 11, 2:57 PM
sbassett added a subscriber: sbassett.

Indeed, this task can become public. @Aklapper: could you remove the protection of this task?

Done.

A more stable endpoint URL could be added, for instance https://lingualibre.org/sparql to mimic Wikidata configuration. Do I add such URL in addition of the current one?

@Seb35 this would make a lot of sense, please update the task description if you do so.

@VIGNERON @WikiLucas00 @Nikki : I add https://lingualibre.org/sparql as possible URL for SPARQL endpoint to ease its use (similar to Wikidata).

Added. The timeout is 60 seconds; if needed it can be increased independently of the timeout of the Blazegraph interface.

Change 699746 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/deploy@master] Add lingualibre sparql endpoint to allowed federated endpoint

https://gerrit.wikimedia.org/r/699746

@Seb35 I don't think https://lingualibre.org/sparql is reachable, at least for me.

Seems to work for me: P16666

Change 699746 merged by DCausse:

[wikidata/query/deploy@master] Add lingualibre to the allowed list federated sparql endpoints

https://gerrit.wikimedia.org/r/699746

Deployed and available on https://wcqs-beta.wmflabs.org/ via SERVICE <https://lingualibre.org/sparql>, will be available on wdqs after the next deploy (probably next monday).