Page MenuHomePhabricator

Unexpected behavior in federated queries with LinguaLibre in WDQS
Open, MediumPublic3 Estimated Story PointsBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

  1. Execute the query shown below in WDQS (link to the query in WDQS)
PREFIX linguap: <https://lingualibre.org/prop/direct/>
PREFIX linguae: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item {
      ?item linguap:P2 linguae:Q5.
    }
  }
}

What happens?:

It results in Server error.

What should have happened instead?:

The query should be successfully executed as happens in Sophox (link to the same query in Sophox)

Software version (if not a Wikimedia wiki), browser information, screenshots, other information, etc.:

WDQS at the time of this writing.

Additional information

I was looking into this so I wrote some more examples. I'll paste them here as it'll probably help people that look into this.

The following query (A) results in Server error in WDQS.

PREFIX prop: <https://lingualibre.org/prop/direct/>
PREFIX entity: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item ?itemLabel {
      ?item prop:P2 entity:Q5.
      SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
    }
  }
}

The following query (B) is successfully executed in WDQS. Note that the only thing that changed are the name of the prefixes.

PREFIX linguap: <https://lingualibre.org/prop/direct/>
PREFIX linguae: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item ?itemLabel {
      ?item linguap:P2 linguae:Q5.
      SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
    }
  }
}

I first thought that the root cause for this problem in (A) was the name of the prefixes, but this is not true. Proof is shown below.

The following is another query (C). It uses prop and entity as prefixes, it results in Server error in WDQS.

PREFIX prop: <https://lingualibre.org/prop/direct/>
PREFIX entity: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item {
      ?item prop:P2 entity:Q5.
    }
  }
}

The following query (D) is the one as the query shown above but with different prefixes. It uses linguap and linguae as prefixes, it results in Server error in WDQS too.

PREFIX linguap: <https://lingualibre.org/prop/direct/>
PREFIX linguae: <https://lingualibre.org/entity/>

SELECT * {
  SERVICE <https://lingualibre.org/sparql> {
    SELECT ?item {
      ?item linguap:P2 linguae:Q5.
    }
  }
}

The queries (A), (B), (C) and (D) were successfully executed in Sophox, but as was mentioned not in WDQS.

Event Timeline

WDQS receives Status Code=502, Status Line=Bad Gateway, Response=<html> from lingualibre servers. I'm not totally sure to understand why it's failing esp. why Shopox is generating a query that is accepted there and why it may sometimes succeed from wdqs when varying the query.

Few simpler examples that bug me:

SELECT * { 
  SERVICE <https://lingualibre.org/sparql> {
    select ?item {
      ?item <https://lingualibre.org/prop/direct/P2> <https://lingualibre.org/entity/Q5> .
    }
  }
}

is NOT OK (OK via shophox)

SELECT * { 
  SERVICE <https://lingualibre.org/sparql> {
      ?item <https://lingualibre.org/prop/direct/P2> <https://lingualibre.org/entity/Q5> .
  }
}

is OK

Rdrg109 renamed this task from Unexpected behavior in federated queries in WDQS to Unexpected behavior in federated queries with LinguaLibre in WDQS.Jan 20 2022, 6:20 AM

Tried to debug this a bit and I believe the problem is on the lingualibre side. I suspect a weird bug happening because of the query length.
Query that passes: https://people.wikimedia.org/~dcausse/T299290-ok.sparql
Query that fails: https://people.wikimedia.org/~dcausse/T299290-bad.sparql
Difference is just one empty space.
Command to run this manually is: curl -X POST -H"Accept: application/sparql-results+xml" --data-urlencode query@T299290-bad.sparql -data-urlencode "queryId=de249fca-8362-11ec-a8a3-0242ac120002" https://lingualibre.org/sparql

Happy to help if LinguaLibre maintainers can obtain some server logs.
Untagging WDQS as there's not much we can do on our side.

Yes. I noticed similar bug. A query succeed or fails depending on the presence of a space character. Really weird.
Now I know there is an issue I will collect the sparql query here. Thanks for this investigation.

Yug triaged this task as Medium priority.Jul 6 2022, 10:53 AM
Yug moved this task from Backlog to Bots and data management on the Lingua-Libre-Legacy board.