CirrusSearch does not find all JavaScript and CSS pages when using insource and intitle syntax
Open, LowestPublic

Description

Where did all the JavaScript pages go?

See Also:

Details

Reference
bz62733
bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz62733.
bzimport added a subscriber: Unknown Object (MLST).
demon added a comment.Mar 21 2014, 5:20 PM

Wonder if something went wrong with gerrit 115214.

I believe this is caused by us not word breaking foo.bar into foo and bar. The solution to this, as I see it, is to use the word_break token filter _but_ to do that I have to rebuild each analyzer with that filter. That isn't easy because now what I want the German analyzer I can ask for
{"analyzer":{"text":{"type":"german"}}}
but to rebuild it I have to do this:
{"analyzer":{"text":{

"filter": [
    "standard",
    "lowercase",
    "german_stop",
    "german_normalization",
    "light_german_stemmer"
],
"tokenizer": "standard",
"type": "custom"

}},"filter":{

"german_stop": {
    "stopwords": [
        "denn",

...

        "eures",
        "dies",
        "bist",
        "kein"
    ],
    "type": "stop"
}

}}

Except even that doesn't work because german_normalization isn't properly exposed! The pull request I've opened upstream exposes all the stuff I'd need and it creates an endpoint on Elasticsearch designed to spit this back out for easy customization.

demon added a comment.Apr 30 2014, 6:15 PM

Interesting. Wonder if we're running into bug 40612 in a different form then.

I have little doubt.

demon removed a subscriber: demon.Aug 19 2015, 4:07 PM
Restricted Application added a project: Discovery. · View Herald TranscriptAug 19 2015, 4:07 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
Deskana lowered the priority of this task from Normal to Lowest.
Deskana renamed this task from CirrusSearch: Where did all the JS pages go? to CirrusSearch does not find all JS pages.
Deskana set Security to None.
Deskana renamed this task from CirrusSearch does not find all JS pages to CirrusSearch does not find all JS pages when it should.
Deskana moved this task from Needs triage to Search on the Discovery board.
Deskana added subscribers: Discovery, Nemo_bis.
Nemo_bis updated the task description. (Show Details)Dec 5 2015, 8:29 AM
He7d3r updated the task description. (Show Details)Dec 23 2015, 11:29 AM
He7d3r renamed this task from CirrusSearch does not find all JS pages when it should to CirrusSearch does not find all JavaScript and CSS pages when it should.Dec 23 2015, 1:28 PM
He7d3r updated the task description. (Show Details)
He7d3r updated the task description. (Show Details)Dec 24 2015, 10:05 AM

Looked briefly into this, the issue is almost certainly related to analyzers used for particular languages as mentioned above. The intitle searches for css and js work on italian, russian, english, chinese and german wiki's, but not on portugese, spanish and probably others.

Deskana renamed this task from CirrusSearch does not find all JavaScript and CSS pages when it should to CirrusSearch does not find all JavaScript and CSS pages when using insource syntax.Dec 31 2015, 3:49 AM
He7d3r added a comment.Jan 1 2016, 6:08 PM

Notice this is not just about insource (see examples in the description)

Deskana renamed this task from CirrusSearch does not find all JavaScript and CSS pages when using insource syntax to CirrusSearch does not find all JavaScript and CSS pages when using insource and intitle syntax.Jan 1 2016, 7:01 PM
Restricted Application added a project: Discovery-Search. · View Herald TranscriptJul 27 2016, 12:10 AM
debt moved this task from Needs triage to Later on the Discovery-Search board.Oct 5 2016, 7:43 PM
Snaevar removed a subscriber: Snaevar.Mar 10 2017, 6:14 PM