Page MenuHomePhabricator

Problems with MLR and small rescore windows
Closed, ResolvedPublic

Description

When running an sltr query with the following configuration:

{       
    "window_size": 20,
    "query": {
        "query_weight": 0,
        "rescore_query_weight": 1,
        "score_mode": "total",
        "rescore_query": {
            "sltr": {
                "model": "enwiki_32153k_500t",
                "params": {
                    "query_string": "jfk"
                }
            }
        }
    }
}

Two problems:

  • Only the top 20 values get new scores (as expected). It seems although we only ask for 20 documents elasticsearch still sorts more than 20 to return the final result. Because the ML model can return negative values, the documents with a score of 0 come before the negative values when they should stay on the front page. This can probably be fixed by adding a large constant value to all the scores.
  • All documents after the initial rescore window have a score of 0. This is a problem for pagination, because the documents after the rescore window can come in any order. This is a little harder to fix. Perhaps the combination of adding a large constant value above, and adjusting the weights to something like query_weight: 1, rescore_query_weight: 100 could do the trick, but it's no guarantee.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Change 370120 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@master] Disable the rescore window of size 20 for ltr test

https://gerrit.wikimedia.org/r/370120

Change 370120 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@master] Disable the rescore window of size 20 for ltr test

https://gerrit.wikimedia.org/r/370120

Change 370121 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.12] Disable the rescore window of size 20 for ltr test

https://gerrit.wikimedia.org/r/370121

Change 370121 merged by jenkins-bot:
[mediawiki/extensions/WikimediaEvents@wmf/1.30.0-wmf.12] Disable the rescore window of size 20 for ltr test

https://gerrit.wikimedia.org/r/370121

Change 370125 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@master] Add a large constant boost to LTR queries

https://gerrit.wikimedia.org/r/370125

The problem happens when we have a rescore size that is lower than a previous one, here we have:

  • 512: phrase
  • 8196: inclinks
  • 1024/20: ltr

According to https://github.com/elastic/elasticsearch/blob/master/core/src/main/java/org/elasticsearch/search/rescore/QueryRescorer.java#L142 QueryRescorer We will set 0 for 8196 docs (the max rescore size) then sort.

With this code in elastic we can't inhibit first pass score if we have a larger window size before the ltr...

Change 370125 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@master] Add a large constant boost to LTR queries

https://gerrit.wikimedia.org/r/370125

Change 370500 had a related patch set uploaded (by EBernhardson; owner: EBernhardson):
[mediawiki/extensions/CirrusSearch@wmf/1.30.0-wmf.12] Add a large constant boost to LTR queries

https://gerrit.wikimedia.org/r/370500

Change 370500 merged by jenkins-bot:
[mediawiki/extensions/CirrusSearch@wmf/1.30.0-wmf.12] Add a large constant boost to LTR queries

https://gerrit.wikimedia.org/r/370500

Stashbot subscribed.

Mentioned in SAL (#wikimedia-operations) [2017-08-07T18:23:27Z] <ebernhardson@tin> Synchronized php-1.30.0-wmf.12/extensions/CirrusSearch/: T169498 limit phrase token count, T172464 constant boost ltr queries (duration: 00m 58s)

debt claimed this task.
debt moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.