Page MenuHomePhabricator

Fix inconsistent behaviour of curly brackets, comments and whitespace in WDQS
Closed, DuplicatePublic

Description

Somewhen around the holiday period, a change to the WDQS seems to have altered the behaviour of some key elements of queries, notably curly brackets and comments. Since this affects loads of example and maintenance queries, I suppose a bug for that has been filed already but I could not find any, so here is a brief description of the problem:

One of my maintenance queries checks for publications that have the word "Zika" in the title but have not been tagged with P921 (main subject) Q202864 (Zika virus). Here it is:

SELECT DISTINCT ?item ?title
WHERE {
  hint:Query hint:optimizer "None".
  SERVICE wikibase:mwapi {
    bd:serviceParam wikibase:api "Search";
                    wikibase:endpoint "www.wikidata.org";
                    mwapi:srsearch "zika haswbstatement:P31=Q13442814".
      ?page_title wikibase:apiOutput mwapi:title.
  }
  BIND(IRI(CONCAT(STR(wd:), ?page_title)) AS ?item)
  FILTER NOT EXISTS { ?item wdt:P921 wd:Q202864. }
  FILTER ( ?item != wd:Q22330904 ) # actually about Spondweni virus, as per Q28733424 
  FILTER ( ?item != wd:Q22330879 ) # Zika b. Marković, M.D. (19 September 1889--3 September 1970)
  # Also check for items falsely tagged with the family name Zika (Q56245778) 
  
  ?item wdt:P31 wd:Q13442814;
        wdt:P1476 ?title.
  FILTER CONTAINS(LCASE(?title), "zika").

}
# LIMIT 10000

It is expected to return 0 results when things are up to date, but when I am commenting out the FILTER NOT EXISTS line, I am now still getting 0 results, so I played around a bit more and noticed that things go back to normal (i.e. currently 4237 results) when I encapsulate the SERVICE block in curly brackets. However, this breaks (i.e. returns 0 results again) when I am then commenting out the ?title variable. Here, things can be brought back to normal by adding an empty line before the curly brackets around the SERVICE block.

Of note, I had to add the ?title part to the query in response to a similar change in the WDQS some months back which resulted in the mwapi:srsearch command not just returning pages where the string "zika" was in the title but also pages where it occurred anywhere on the page, e.g. in the name of an author or in the title of a cited publication.

I would welcome some feature in the WDQS that would announce, highlight and explain such changes from somewhere near the UI.

Event Timeline

Restricted Application added a project: Wikidata. · View Herald TranscriptJan 2 2019, 1:40 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

There was no changes in query parsing recently (and whitespace etc. shouldn't matter for query results anyway, that would be against the standard). Your difference in results may be a result of T197598: MWAPI query with LIMIT ignores MINUS, where depending on which branch the query optimizer takes, you may get correct or wrong result filtering the result of MWAPI query. I'll check more in detail to see if this is indeed what happens with your query.

Of note, I had to add the ?title part to the query in response to a similar change in the WDQS some months back which resulted in the mwapi:srsearch command not just returning pages where the string "zika" was in the title but also pages where it occurred anywhere on the page

Wikidata search is supposed to return results of all matches, including texts of the label, description and some statements. The improvements in Wikidata search have been announced. Wikidata item can not have "zika" string in the title, since titles are all Q-ids, but if you need search only in label fields (ignoring descriptions and other statements) this syntax can be added. Please file a task for this. Regular search applies to whole item since this is what most users would expect when using the search API.

Lydia_Pintscher closed this task as Invalid.Jan 3 2019, 11:14 AM
Lydia_Pintscher added a subscriber: Lydia_Pintscher.

I'm closing this based on Stas' comment. Please reopen if there is anything left.

I think it's a duplicate of T197598. When that is fixed, I'll re-verify this one too.