Page MenuHomePhabricator

Wikidata autocomplete (wbsearchentities) results with score <= 0
Open, NormalPublic

Description

When running the wikidata autocomplete queries it's possible for results to have 0 score due to the way it is constructed. This is not currently a critical problem, but future versions of elastic disallow negative scores.

https://www.wikidata.org//w/api.php?action=wbsearchentities&format=json&search=abstract+art&language=en&cirrusDumpQuery

{
    "bool": {
      "should": [
        {
          "bool": {
            "filter": [ { "match": { "labels_all.prefix": "albert" } } ],
            "should": [
              {
                "dis_max": {
                  "tie_breaker": 0,
                  "queries": [
                    { "constant_score": { "filter": { "match": { "labels.en.near_match": "albert" } }, "boost": 2 } },
                    { "constant_score": { "filter": { "match": { "labels.en.near_match_folded": "albert" } }, "boost": 1.6 } },
                    { "constant_score": { "filter": { "match": { "labels.en.prefix": "albert" } }, "boost": 1.1 } },
                    { "constant_score": { "filter": { "match": { "labels_all.near_match_folded": "albert" } }, "boost": 0.001 } }
                  ]
                }
              }
            ]
          }
        },
        { "term": { "title.keyword": "albert" } }
      ],
      "minimum_should_match": 1,
      "filter": [ { "term": { "content_model": "wikibase-item" } } ]
    }
  }

https://www.wikidata.org//w/api.php?action=wbsearchentities&format=json&search=abstract+art&language=en&cirrusDumpResult
The 6th result has a score of 0:

{
    _index: "wikidatawiki_content_1537536135",
    _type: "page",
    _id: "55400981",
    _score: 0,
    _source: {
        namespace: 0,
        title: "Q55370741",
        descriptions: {
            en: "exhibition"
        }
    },
    highlight: {
        labels.nl.prefix: [
            "0:0-12:40|Abstract art, Befreiung, Stil und Ironie"
        ]
    }
}

In this particular case there are only 7 results, so a score wouldn't change anything, but this likely occurs elsewhere. Item's can match the bool filter but nothing else, resulting in a score of 0. Negative rescore boosts take the 0 and turn it negative.Converting that filter into a must with tiny boost should ensure we always have some sort of score to apply basic ordering:

{
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
                { "constant_score": { "filter": { "match": { "labels_all.prefix": "albert" }}, "boost": 0.001} }
            ],
            "should": [
              {
                "dis_max": {
                  "tie_breaker": 0,
                  "queries": [
                    { "constant_score": { "filter": { "match": { "labels.en.near_match": "albert" } }, "boost": 2 } },
                    { "constant_score": { "filter": { "match": { "labels.en.near_match_folded": "albert" } }, "boost": 1.6 } },
                    { "constant_score": { "filter": { "match": { "labels.en.prefix": "albert" } }, "boost": 1.1 } },
                    { "constant_score": { "filter": { "match": { "labels_all.near_match_folded": "albert" } }, "boost": 0.001 } }
                  ]
                }
              }
            ]
          }
        },
        { "term": { "title.keyword": "{{QUERY_STRING}}" } }
      ],
      "minimum_should_match": 1,
      "filter": [ { "term": { "content_model": "wikibase-item" } } ]
    }
  }

For the negative boosts, perhaps we can come up with a way to switch them from sum's to products. A product with a value < 1 will de-boost things without going negative.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptNov 19 2018, 4:28 PM
EBernhardson updated the task description. (Show Details)Nov 19 2018, 4:28 PM
EBernhardson updated the task description. (Show Details)
Addshore moved this task from incoming to monitoring on the Wikidata board.Nov 19 2018, 5:10 PM

I suggest converting the negative boosts to a positive boost and flip the filter condition to MUST_NOT, I think we can do this automatically within cirrus.

Smalyshev triaged this task as Normal priority.Jan 29 2019, 6:48 PM
Smalyshev added a comment.EditedJun 4 2019, 6:53 AM

I suspect the fix for this will be T215615: Stop using negative scores for deboosting statements? Can we merge them?

Restricted Application added a subscriber: Liuxinyu970226. · View Herald TranscriptJun 4 2019, 6:53 AM