Page MenuHomePhabricator

Search snippets are not stripping hatnotes
Closed, DeclinedPublic

Description

Anything in a div with class hatnote should be striped from our search snippets. Otherwise you have no idea what the article is actually about:

Screen Shot 2017-05-08 at 12.12.56 PM.png (314×596 px, 56 KB)

Event Timeline

should the hatnote still be part of the searchable content? If it should, should it still be considered at the same weight as normal content?

It probably should not be part of the searchable content, since typically the purpose of hatnotes is to direct users who are at the wrong article to the article they are actually looking for. So it's almost a layer of search meta-data, but not part of the article content as such.

I took a poke around and it seems hatnote is an enwiki-ism, so i'm not sure putting this selector into core (where what should be searchable for wikitext content is decided) is quite right. I'm not completely opposed, but the cleaner solution would be to update the pages that generate class="hatnote" to also use the class "navigation-not-searchable"

@EBernhardson: Thanks for the info. I'll follow-up on the navigation-not-searchable suggestion on-wiki.

@EBernhardson: Could you create some documentation about "navigation-not-searchable" that I could point at? I wasn't able to find anything on wikitech or mediawiki.org.

In case someone is looking for it: "navigation-not-searchable" was implemented in https://gerrit.wikimedia.org/r/#/c/348855/2/includes/content/WikiTextStructure.php for T162905.

dcausse added some documentation to the Help:CirrusSearch page on mediawiki.org

@EBernhardson: navigation-not-searchable has been added to all hatnotes on English Wikipedia. I've confirmed that the search snippets are now hatnote free. Thanks for your help!

This does have a side-effect of making such pages not show up if you do a query for insource:hatnote since the data has been removed from the search index. This behaviour would not be expected.

You can always use Special:WhatLinksHere to get a list of transclusions of any particular template, so I don't know whether this actually causes any problems for users.

This does have a side-effect of making such pages not show up if you do a query for insource:hatnote since the data has been removed from the search index. This behaviour would not be expected.

Noted. Personally, I've always used Special:WhatLinksHere to look for template transclusions, so hopefully this won't be an issue for users.

insource:hatnote should still work? The data is still part of the source_text field, it is removed from the 'text' field where we take the parser output, load it into DOM, delete elements matching various selectors (including navigation-not-searchable) and then emit only the text of the document. Additionally the more direct way to look for template transclusions in cirrus would be with hastemplate:Module:Hatnote

insource:hatnote should still work? The data is still part of the source_text field, it is removed from the 'text' field where we take the parser output, load it into DOM, delete elements matching various selectors (including navigation-not-searchable) and then emit only the text of the document.

You're right. I was searching for christopher columbus insource:hatnote, but the hatnote is actually put in there from another template and not directly from the source of the page. My bad. :-)