Page MenuHomePhabricator

Prefix Search: Would be nice if search engine could highlight the result rather than js
Open, LowPublic

Description

Sometimes we see weird results in the prefix search because Cirrus uses different matching rules then the jquery.suggestions library. In English, for example, Cirrus flattens high ascii. Searching for "resume" will return "résumé". Cirrus is quite capable of highlighting the result properly, but it has no way to tell the front-end what the result should look like.

I don't believe it would be practical to replicate Cirrus's logic on the front end because it can change and it is different for different wikis.

Details

Reference
bz60976

Event Timeline

bzimport raised the priority of this task from to Normal.Nov 22 2014, 3:03 AM
bzimport added a project: MediaWiki-JavaScript.
bzimport set Reference to bz60976.
bzimport added a subscriber: Unknown Object (MLST).

I don't care how you do this, but please do. I hate the core search suggestions module.

Core could totally also output match indices from the opensearch API (that shouldn't be incompatible with anything, but I haven't checked), naively by default (we could just implement the same logic as the JS module has now), with a hook override for better search extensions. Then we could apply bolding in the UI trivially based on these indices.

I'm glad it has bothered someone else too.

matmarex updated the task description. (Show Details)Dec 21 2014, 6:10 PM
matmarex set Security to None.
matmarex removed a subscriber: Unknown Object (MLST).

So, can we make this happen? When the necessary information is somehow exposed via action=opensearch API, I'll be happy to to implement the JavaScript part of this.

Krinkle updated the task description. (Show Details)Jan 8 2015, 9:16 PM

If I understand correctly, the OpenSearch API follows a standard response format we shouldn't change. We can add it to a prefixsearch or search API module, however. Probably using offsets or substrings to indicate what to highlight.

Current format:

{
"query": "resum"
"results": [
  "Resumé",
  "Resumé (magazine)",
  "RESUMECHAR (CONFIG.SYS directive)",
  "Resumen de acompañar"
]
}

Current (incomplete) highlighting behaviour:

Proposed formats:

{
"query": "resum"
"results": [
  [ 5, "Resumé" ],
  [ 5, "RESUMECHAR (CONFIG.SYS directive)" ],
  [ 5, "Resumé (magazine)" ],
  [ 5, "Resumen de acompañar" ]
]
}
{
"query": "resum"
"results": [
  [ "Resum", Resumé" ],
  [ "RESUM", "RESUMECHAR (CONFIG.SYS directive)", ] ..
]
}

Actually... Unless there are cases where the interpretation of unicode code points is different for one of the flattened characters, wouldn't it always simply be the length of the input string?

Except for namespace prefixes, as we allow normalisation/localisation of those.

Actually... Unless there are cases where the interpretation of unicode code points is different for one of the flattened characters, wouldn't it always simply be the length of the input string?

No, the processing can cause the number of separate characters to change, for example æ↔ae, ß↔ss. (I was also under the impression that Cirrus ignored/downplayed non-word characters like '(' when displaying search suggestions, but it doesn't seem to now.)

No, the processing can cause the number of separate characters to change, for example æ↔ae, ß↔ss. (I was also under the impression that Cirrus ignored/downplayed non-word characters like '(' when displaying search suggestions, but it doesn't seem to now.)

It does that in full text search but prefix search includes them. Its supposed to be just the right kind of sloppy matching....

But, yeah, the most flexibility possible would be best. We want the ability to properly handle whatever off the wall request comes in and if the highlighting code makes any assumptions then it'll break it. The best would be to accept offset pairs to highlight or the string marked up with <em> tags or something. The <em> tags might be simplest because you could transform them on the client side to whatever you like but they'd still be simple to read right in the string. Simpler than offset pairs, at least.

If I understand correctly, the OpenSearch API follows a standard response format we shouldn't change.

Can we not extend it? Like add another key, say 'matches', that would contain indexes of matched substrings in each suggestion result?

If I understand correctly, the OpenSearch API follows a standard response format we shouldn't change.

Can we not extend it? Like add another key, say 'matches', that would contain indexes of matched substrings in each suggestion result?

OpenSearch format is an array with an array inside. No string keys.

https://www.mediawiki.org/w/api.php?action=opensearch&search=ap&limit=4

[
    "ap",
    [
        "Apache configuration",
        "Apps/Commons",
        "Apps",
        "API/maintenance"
    ]
]

It seems we already extended it it by adding a second and third array at the end for text extract and urls:

[
    "ap",
    [
        "Apache configuration",
        "Apps/Commons",
        "Apps",
        "API/maintenance"
    ],
    [
        "Apache is probably the webserver used most with MediaWiki.",
        "",
        "",
        "This page is to document activity related to the MediaWiki API. This is an ongoing activity, led by Sam Reed."
    ],
    [
        "https://www.mediawiki.org/wiki/Apache_configuration",
        "https://www.mediawiki.org/wiki/Apps/Commons",
        "https://www.mediawiki.org/wiki/Apps",
        "https://www.mediawiki.org/wiki/API/maintenance"
    ]
]

That doesn't scale well though.

On second thought. From a design and user experience point of view. Do we even need the highlighting? I've rarely seen this kind of highlighting done in other search interfaces or autocompleted form fields. They just show the results.

I've played with it a bit locally and am liking it a lot. It feels a little wrong because we're so used to bit. I'd like to consider ditching that logic altogether and just displaying the results are normal (linked) text.


Thoughts?

Restricted Application added a project: Discovery-Search. · View Herald TranscriptSep 24 2018, 7:22 AM
EBjune lowered the priority of this task from Normal to Low.Sep 27 2018, 5:17 PM
EBernhardson renamed this task from Prefix Search: Would be nice if php could highlight the result rather than js to Prefix Search: Would be nice if search engine could highlight the result rather than js.Sep 27 2018, 5:17 PM