Page MenuHomePhabricator

srprop "sectiontitle" has no effect in list=search API
Closed, DuplicatePublic1 Estimated Story PointsBUG REPORT

Description

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
Identical response: {"batchcomplete":true,"query":{"searchinfo":{"totalhits":1},"search":[{"ns":3,"title":"User talk:Alexis Jazz","pageid":4967517,"size":115452}]}}

What should have happened instead?:
Second response should include "A barnstar for you!" somewhere, the title of the section.

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

To get a section link the name of the section has to have at least a one word overlap with the search query. A section link is additionally only identified when neither the article title or category list match the search terms. Critically it also doesn't matter which section title matches, having a section title close or far away from the other search terms is not taken into account.

Adding barnstar to the query, matching an expected section title, gives a section link as expected: https://commons.wikimedia.org/w/api.php?action=query&format=json&formatversion=2&list=search&srsearch=barnstar%20%22I%20found%20this%20rare%20and%20important%20surviving%20copy%20of%20the%20Ten%20Hour%20Day1835%22&srprop=size|sectiontitle&srnamespace=*&srlimit=10

To get a section link the name of the section has to have at least a one word overlap with the search query. A section link is additionally only identified when neither the article title or category list match the search terms. Critically it also doesn't matter which section title matches, having a section title close or far away from the other search terms is not taken into account.

Adding barnstar to the query, matching an expected section title, gives a section link as expected: https://commons.wikimedia.org/w/api.php?action=query&format=json&formatversion=2&list=search&srsearch=barnstar%20%22I%20found%20this%20rare%20and%20important%20surviving%20copy%20of%20the%20Ten%20Hour%20Day1835%22&srprop=size|sectiontitle&srnamespace=*&srlimit=10

https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bsearch says "sectiontitle: Adds the title of the matching section.", I'd assume "matching section" would refer to the section in which the search terms were found, not just section titles in which the search terms were found. I'd have never thought of what you just explained based on the description in action=help. I doubt anyone could have guessed that.

I'd assume "matching section" would refer to the section in which the search terms were found, not just section titles in which the search terms were found.

Matching here literally means the text of the string matches the text of the piece of content (the section headings). The content is separated into independant conceptual chunks, such as the headings, the categories, content, etc. Throughout search (not just this one feature) there is no concept of textual content being a part of a section. The sections are simply a separate list of strings to match against.

I'd assume "matching section" would refer to the section in which the search terms were found, not just section titles in which the search terms were found.

Matching here literally means the text of the string matches the text of the piece of content (the section headings). The content is separated into independant conceptual chunks, such as the headings, the categories, content, etc. Throughout search (not just this one feature) there is no concept of textual content being a part of a section. The sections are simply a separate list of strings to match against.

The description for "sectiontitle" is "Adds the title of the matching section." while the description for "sectionsnippet" is "Adds a parsed snippet of the matching section title." Wouldn't "Adds the title/a parsed snippet of the first section title that matches a search term" be more accurate?

With the knowledge shared by @EBernhardson I did another experiment, and strangely this is indeed how it works, also in Special:Search:

As I don't seem to have mentioned it yet, what I wanted to do was perform a search for some text that I'd know would be in some section (no guarantee, but probably), but I didn't know on what page because it wasn't found on the page where it was expected. (which would most commonly happen because the section was moved) I was hoping to get the desired page title and section title all at once cheaply with srprop sectiontitle, but it didn't work that way. Searching for the actual section title is not a realistic option for me: there are too many sections with generic titles like "Question" (well over 10000 of those on enwiki), "Help needed", "Where is my article?", "Proposal", etc etc etc. Besides, the title may have been changed when the section was moved.

I ended up doing the search as normal and having the full wikitext downloaded for the result so that can be searched. Which is now happening on an ongoing basis as there is no other way atm.

Thanks for the examples. As you may have noticed, and as Erik pointed out above, the behavior you are noticing is section title highlighting working as intended, even if it is not the best experience. We do not and cannot currently index sections as searchable documents. When section titles are highlighted in search results, this is due to part of the search query matching the specific text in that section title itself, not because of any kind of relevancy of the section's content.

This ticket is currently scoped to update our documentation to reflect the current functionality. We recognize that the highlighting is a bit confusing, though, and further work to improve the highlighting experience in search results will come as future work in separate tickets. Hopefully you are able to find a workaround until then.

Thanks for the examples. As you may have noticed, and as Erik pointed out above, the behavior you are noticing is section title highlighting working as intended, even if it is not the best experience. We do not and cannot currently index sections as searchable documents. When section titles are highlighted in search results, this is due to part of the search query matching the specific text in that section title itself, not because of any kind of relevancy of the section's content.

For Special:Search:

  • If the search terms match only a section title on a page, return User talk:Alexis Jazz (section Beeld & Geluid) without a snippet. Currently a snippet of the page as a whole is included. It shouldn't be. Ideally, a snippet of the section in question would be included, but I understand this is not trivial.
  • If the search terms match only content that isn't a section title, a snippet is (and should be) shown. Ideally, the snippet would be accompanied by a link to the section it came from. I understand this is not trivial, though.
  • If the search terms match both a section title and content, show the snippet for the content but omit the section link as there's no guarantee it's the section for the snippet. It can only lead to confusion.

This ticket is currently scoped to update our documentation to reflect the current functionality. We recognize that the highlighting is a bit confusing, though, and further work to improve the highlighting experience in search results will come as future work in separate tickets. Hopefully you are able to find a workaround until then.

For Special:Search the only workaround is the knowledge that it doesn't work the way one may expect it to. A blunt $('.mw-search-result-heading .searchalttitle').remove() is actually an improvement in that department.

For the API, I already have my workaround, but if I could get the information I need more cheaply in the future that would be nice. I won't hold my breath though, I know that would be a multi-year plan, if it happens at all.

Bringing some conversation about documentation over from IRC:

Problems

  • “parsed snippet” is not helpful because it is so unclear, so either it needs to be defined, or explained inline.
  • The snippet hierarchy needs to be documented—or, better, removed from the API and put in Special:Search where it belongs.

In no particular order....


Documentation Option 73 (explain ‘parsed’ inline, explain snippet hierarchy):

snippet: Adds a snippet of the page, with query term highlighting markup.
titlesnippet: Adds the page title, with query term highlighting markup.
redirecttitle: Adds the title of the matching redirect, if available.
redirectsnippet: Adds the title of the matching redirect, if available, with query term highlighting markup.
sectiontitle: Adds the title of the matching section, if available.
sectionsnippet: Adds the title of the matching section, if available, with query term highlighting markup.
categorysnippet: Adds a the matching category name, if available, with query term highlighting markup.

Note that there is a hierarchy of titles and snippets, and the existence of a title snippet blocks the return of redirect, category, or section titles or snippets. The existence of a redirect title or snippet blocks the return of category, or section titles or snippets. The existence of a category snippet blocks the return of section titles or snippets. Note that this is true even if you did not request the blocking property.

  • Pros: each property is (more) self-contained
  • Cons: wordy as heck; the snippet hierarchy is dumb.

Documentation Option B.4 (explain ‘parsed’ after, explain snippet hierarchy):

snippet: Adds a parsed snippet of the page.
titlesnippet: Adds a parsed snippet of the page title.
redirecttitle: Adds the title of the matching redirect, if available.
redirectsnippet: Adds a parsed snippet of the redirect title, if available.
sectiontitle: Adds the title of the matching section, if available.
sectionsnippet: Adds a parsed snippet of the matching section title, if available.
categorysnippet: Adds a parsed snippet of the matching category, if available.

Parsed snippets include query term highlighting markup.

Note that there is a hierarchy of titles and snippets, and the existence of a title snippet blocks the return of redirect, category, or section titles or snippets. The existence of a redirect title or snippet blocks the return of category, or section titles or snippets. The existence of a category snippet blocks the return of section titles or snippets. Note that this is true even if you did not request the blocking property.

  • Pros: pars(ed)imonious descriptions, but need to read further to understand what “parsed snippet” means.
  • Cons: opaque as heck; the snippet hierarchy is dumb.

Proposal III-Ꙫ (remove snippet hierarchy and make the Special:Search UI sort out it’s own problems):

  • Prereq: Update Special:Search if needed to handle gettng multiple title/redirect/category/section values returned.
  • Change API code so that all requested titles and snippets are returned. Add plainsnippet and title (specific names negotiable), and categorytitle.

plainsnippet: Adds a snippet of the page.
snippet: Adds a snippet of the page, with query term highlighting markup.
title: Adds the page title.
titlesnippet: Adds the page title, with query term highlighting markup.
redirecttitle: Adds the title of the matching redirect.
redirectsnippet: Adds the title of the matching redirect, with query term highlighting markup.
sectiontitle: Adds the title of the matching section.
sectionsnippet: Adds the title of the matching section, with query term highlighting markup.
categorytitle: Adds a the matching category name.
categorysnippet: Adds a the matching category name, with query term highlighting markup.

  • Pros: All options available and the user gets what they want, not what Special:Search wants or needs. All obvious variations are available.
  • Cons: Semantics of “title” and “snippet” are a little wonky, “plainsnippet” and "titlesnippet” are not the greatest names and don’t fit the title/snippet pattern, but naming is hard.
  • Options:
    • Obviously could use “parsed snippet” language as above with the extra note, instead of “highlighting markup” language used here.
    • Adding plainsnippet, title, and categorytitle could be saved for later (or never).
    • Could add more sensible names (e.g., snippet/snippethighlight, title/titlehighlight, redirect/redirecthighlight, section/sectionhighlight, and category/categoryhighlight) and deprecate the current names (or keep them for backward compatibility).

My personal preference is Proposal III-Ꙫ, unless it is harder than it seems to update Special:Search to behave, then I prefer Documentation Option 73 followed by Proposal III-Ꙫ at a later time if possible. But I'm not currently doing the work, so just updating the docs (Option 73 or Option B.4) makes a lot of sense and will decrease confusion.

Change 802180 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/extensions/CirrusSearch@master] Add ability to disable skip_if_last_matched

https://gerrit.wikimedia.org/r/802180

Change 802642 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/core@master] search: Hint the SearchEngine about the set of snippets to provide

https://gerrit.wikimedia.org/r/802642

Change 802180 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Add ability to disable skip_if_last_matched

https://gerrit.wikimedia.org/r/802180

Change 802642 merged by jenkins-bot:

[mediawiki/core@master] search: Hint the SearchEngine about the set of snippets to provide

https://gerrit.wikimedia.org/r/802642

Change 803325 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/core@master] api-query-search: Update prop parameter documentation

https://gerrit.wikimedia.org/r/803325

Change 803325 merged by jenkins-bot:

[mediawiki/core@master] api-query-search: Update prop parameter documentation

https://gerrit.wikimedia.org/r/803325

Change 817311 had a related patch set uploaded (by Ebernhardson; author: Ebernhardson):

[mediawiki/core@master] api-query-search: Request snippets when title variants are requested

https://gerrit.wikimedia.org/r/817311

Change 817311 merged by jenkins-bot:

[mediawiki/core@master] api-query-search: Request snippets when title variants are requested

https://gerrit.wikimedia.org/r/817311

dcausse subscribed.

The patches above I think improved the documentation and overall behavior of the search API to make it a bit more consistent with Special:Search.
They do not solve the problem of having coherent section title with the content snippet, this problem is I think covered in T131950 and I'm tentatively closing as duplicate of it (@AlexisJazz please feel to re-open if I misinterpreted what you are asking).