Page MenuHomePhabricator

[L] Quick view - Show the expanded search result snippet
Closed, ResolvedPublic

Description

This ticket is a follow up ticket to T316396

Note: This is desktop only. For mobile implementation, see T327543.

The purpose of this work is to provide more context to the user in the quick view.
In the initial work we did for the snippet we are showing the same snippet in quick view as seen in search results. Showing what the user has already seen in the search results again in quick view is not very useful. Hence in this ticket is to expand the snippet so that it provides more information to the user without having them to go look for it in the article. The user may still want to read the full article but the expanded snippet can provide a little more context and allow users to assess the relevance of the result.

Ideal solution

  • Expand the snippet to complete sentences for however many sentences there are in the snippet. break into its own ticket

When we are not able to identify the sentence boundaries fall back to conservative solution.

Conservative solution

  • Expand snippet such that there is little bit of text from before the snippet and after the snippet that show up in the quick view.
  • How much more? Expand in each direction such that actual snippet is no more than X# characters on desktop. However this needs to be flexible so we can try different lengths.
  • Show ellipsis before/after the snippet when there is more content available in those directions.
  • If there are sentences in the snippet from different parts of the article add ellipsis between them. break into its own ticket

Implementation note: as of T319291, cirrusdoc now allows filtering based on fields. We will need to expose which field the snippet is part of (it can text, auxiliary_text, file_text, opening_text, ...) so we know which to fetch. See quick-and-dirty POC here: https://gerrit.wikimedia.org/r/838165

For detailed specs to to Figma.

Special_Search (23).png (1×1 px, 275 KB)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
CBogen added a subscriber: CBogen.

Moving to Milestone 4 due to technical complexity. We will still have the snippet ellipses and highlighting in Milestone 2 (T316396), just not the expanded snippet in this ticket.

While looking at old threads I realized that we also have cases were snippet is taken from two different section of the article and merged as one like in this example that Trey had shared.

In such cases expansion can become more complicated. @TJones do you have any thoughts on this based on the acceptance criteria outlined?

Short version: Use an ellipsis (or relevant icon) in the middle of the expanded snippet to skip over things that don't snip well.

(This runs on a little long because I was working out my ideas while writing my response.)

I'm not sure how feasible this is, depending on what kind of info you get about the snippet and how hard it is to parse the original wikitext or whatever to find the source of the snippet, but an obvious solution (because I use it on my own website for article previews) is to replace unsnippable things with an ellipsis. (For me, that's usually images and tables.) If you wanted to get fancy, I guess you could replace common unsnippables—infoboxes, tables, images—with icons to represent what was there, but I don't know if that is good UX or not.

The example above searches for "infinity saga the films from" (with quotes), and the snippet is:

Phase One through Phase Three are collectively known as "The Infinity Saga". The films from Phase Four through Phase Six are collectively known as "the...

An expanded snippet that added the rest of the sentences before and after the highlight and acknowledged the giant table that's there might look like this:

The films from Phase One through Phase Three are collectively known as "The Infinity Saga". ... The films from Phase Four through Phase Six are collectively known as "the Multiverse Saga". They also include multiple series and two specials streaming on Disney+.

However, it's still skipping over the section headings.. though those could be put back, like this, with an ellipsis or an icon (I just threw in a chart icon since that emoji is available):

The Infinity Saga
The films from Phase One through Phase Three are collectively known as "The Infinity Saga".
... or 📊
The Multiverse Saga
The films from Phase Four through Phase Six are collectively known as "the Multiverse Saga". They also include multiple series and two specials streaming on Disney+.

Of course, that can get a little ridiculous, especially in an article like this with so many headings and tables with very little text between them. Adding one more sentence before and after the sentences that cover the original snippet gives this!:

Films
Marvel Studios releases its films in groups called "phases".

The Infinity Saga
The films from Phase One through Phase Three are collectively known as "The Infinity Saga"

📊

The Multiverse Saga
The films from Phase Four through Phase Six are collectively known as "the Multiverse Saga". They also include multiple series and two specials streaming on Disney+.

Future
📊
At any given time, Marvel Studios has future films planned five to six years out from what they have announced.

A reasonable heuristic, I think—which would alleviate a lot of the potential vertical space problem—is to not cross section boundaries when creating an expanded snippet. The text at the end of one section and the text at the beginning of another section aren't necessarily closely related, so don't cross section boundaries! (However, the original snippet itself crossed a section boundary, so you would include that one.)

In a textier article, if you hit a section boundary in one direction, you could optionally expand the snippet in the other direction. For example, if you are aiming for the sentence(s) in the snippet, plus one more before and after, but there aren't any more before (without crossing a section boundary), then take two sentences after.

Not crossing section boundaries in our example takes us back to this:

The Infinity Saga
The films from Phase One through Phase Three are collectively known as "The Infinity Saga".
... or 📊
The Multiverse Saga
The films from Phase Four through Phase Six are collectively known as "the Multiverse Saga". They also include multiple series and two specials streaming on Disney+.

Someone better at design than me could come up with a better implementation, but it seems like there might also be a way to do all of this inline, to save on vertical space:

[The Infinity Saga] The films from Phase One through Phase Three are collectively known as "The Infinity Saga". ... 📊 ... [The Multiverse Saga] The films from Phase Four through Phase Six are collectively known as "the Multiverse Saga". They also include multiple series and two specials streaming on Disney+.

Of course, "... 📊 ..." could just be "...".

All that said... the simplest first step would just be putting "..." where things (table, images, section headings) that aren't in the snippet are.

Ahh.. one other comment: finding sentence boundaries can be non-trivial, depending on the language. And even if you go for a bare-bones approach, you still need to account for punctuation used in other languages/scripts—«»„‟ "',.!?;「」。… ·¿¡؟، etc.

Hmm, I see the discussion in T311161 now, so maybe all of this is for naught.

Thanks @TJones All of this info is super useful.

  1. I like the idea of simply adding the ellipsis between expanded sentences (if they are from different parts of the article) something I could add in the description as requirement if it is feasible.
  2. Showing some indication about the content whether it is a table, list, images etc. through icons or text was difficult to parse according to the discussion on this ticket. T311161#8106363
  3. Showing section from which the snippet is from would be great next step but was not part of the scope of this ticket. I remember there were some challenges with it too.
  4. Knowing where the sentence ends seems tricky based on the article you shared. But it says the current strategy gets about 95% of sentences correct which is a pretty good number to try out this approach.
  1. Knowing where the sentence ends seems tricky based on the article you shared. But it says the current strategy gets about 95% of sentences correct which is a pretty good number to try out this approach.

I'm okay with it if you are okay with it, but it's more intuitive to think about accuracy numbers like that in terms of error rates. 95% correct is 1 error in 20. Expanding a snippet to include the rest of the sentence the snippet it in, plus one sentence before and after, is 4 sentence boundaries. So on average one in 5 expanded snippets would have an error. Not all would be obvious because you might just have a shorter or longer snippet rather than having the middle of a sentence.

Also, the strategy involves a "hand-compiled list of abbreviations", and so is not going to work as well for other languages. An English list probably won't have French "Mme.", for example, so performance will be worse on French. I don't even know where to find a list of Igbo or Belarusian abbreviations—though I have some ideas on how to mine for such a list! Doing that for every language would be a lot of work—though it would be a fun project!

Expanding a snippet to include the rest of the sentence the snippet it in, plus one sentence before and after, is 4 sentence boundaries.

Hi Trey, the intention is not add new sentences but only to show complete sentences as best as we can... And for cases we cannot we can fall back to a more conservative solution in the description.

CBogen renamed this task from Quick view - Show the expanded search result snippet to [XL] Quick view - Show the expanded search result snippet.Jan 25 2023, 5:51 PM
CBogen updated the task description. (Show Details)

Moving back to Ready for Estimation to re-estimate just the conservative solution. We may consider breaking the "ideal" solution into its own, separate SPIKE ticket depending on the outcome.

CBogen renamed this task from [XL] Quick view - Show the expanded search result snippet to [L] Quick view - Show the expanded search result snippet.Feb 8 2023, 5:26 PM

Change 894223 had a related patch set uploaded (by Simone Cuomo; author: Simone Cuomo):

[mediawiki/extensions/SearchVue@master] Quick view - Show the expanded search result snippet

https://gerrit.wikimedia.org/r/894223

Change 894223 merged by jenkins-bot:

[mediawiki/extensions/SearchVue@master] Quick view - Show the expanded search result snippet

https://gerrit.wikimedia.org/r/894223

@SimoneThisDot - it could be the beta cluster environment limitation (I was checking on enwiki in beta), but all quick view snippets were identical to article snippets.

Change 902667 had a related patch set uploaded (by Simone Cuomo; author: Simone Cuomo):

[mediawiki/extensions/SearchVue@master] Expanded snippets

https://gerrit.wikimedia.org/r/902667

Change 902667 merged by jenkins-bot:

[mediawiki/extensions/SearchVue@master] Expanded snippets

https://gerrit.wikimedia.org/r/902667

Change 902714 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/core@master] Support exposing fieldname a search snippet is sourced from

https://gerrit.wikimedia.org/r/902714

Change 838165 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/extensions/CirrusSearch@master] Expose fieldname a search snippet is sourced from

https://gerrit.wikimedia.org/r/838165

Checked in betalabs:

Screen Shot 2023-03-24 at 10.44.15 AM.png (1×2 px, 1 MB)
Screen Shot 2023-03-24 at 10.45.49 AM.png (1×2 px, 550 KB)

Notes for testing in Production:

  • often an article snippet (and, consequetnly, a quick view snippet) refers to References sections of an article; check if it's displayed exactly as it's displayed in an article. Betalabs sometimes mangles the quick view snippet from Reference section.
  • check how formulas (music notation, other notations) are displayed in a quick view snippet

Change 903180 had a related patch set uploaded (by Matthias Mullie; author: Matthias Mullie):

[mediawiki/extensions/SearchVue@master] Trim .plain off of snippet fieldname for API call

https://gerrit.wikimedia.org/r/903180

Change 902714 merged by jenkins-bot:

[mediawiki/core@master] Support exposing fieldname a search snippet is sourced from

https://gerrit.wikimedia.org/r/902714

Change 838165 merged by jenkins-bot:

[mediawiki/extensions/CirrusSearch@master] Expose fieldname a search snippet is sourced from

https://gerrit.wikimedia.org/r/838165

Change 903180 merged by jenkins-bot:

[mediawiki/extensions/SearchVue@master] Trim .plain off of snippet fieldname for API call

https://gerrit.wikimedia.org/r/903180

matthiasmullie added a subscriber: matthiasmullie.

Moving back to QA after having merged a couple more patches.

The code that will hit production on Thu should deal with most regular search snippets.
All patches merged today (in prod next week) are about dealing with snippets coming from other fields; the easiest way to test those is to do an insource query, which will trigger snippets from the source_text field (which contains the raw wikitext)

Moving back to QA after having merged a couple more patches.

The code that will hit production on Thu should deal with most regular search snippets.
All patches merged today (in prod next week) are about dealing with snippets coming from other fields; the easiest way to test those is to do an insource query, which will trigger snippets from the source_text field (which contains the raw wikitext)

Thank you, @matthiasmullie! Re-checked in betalabs, all seems to be fine. Since betalabs has some extreme cases of page content (i.e. some pages were deliberately created to confuse parsing), there were two cases that triggered the following Console error, e.g. search=state+state+autocollapse&quickView=TemplateUsageArticle678 and search=galaxy&quickView=Cabbit:

TypeError: Cannot read properties of undefined (reading 'length')
 at generateExpandedSnippet

Screen Shot 2023-03-28 at 4.36.25 PM.png (778×2 px, 248 KB)

I think it's unlikely to see such error in production, but, probably, such extreme cases should be handled? Of course, I check production for such errors.

Change 904008 had a related patch set uploaded (by Simone Cuomo; author: Simone Cuomo):

[mediawiki/extensions/SearchVue@master] SearchPreview: Ensure extension does not break if snippets cannot be found

https://gerrit.wikimedia.org/r/904008

Hei @Etonkovidova thank you for finding this!!

I have created a patch for this to be fixed (so that it does not break in production), but I am going to investigate further as for some reason the snippets is not there in the code and we cannot generate the expanded text for the example shown.

Change 904008 merged by jenkins-bot:

[mediawiki/extensions/SearchVue@master] SearchPreview: Ensure extension does not break if snippets cannot be found

https://gerrit.wikimedia.org/r/904008

Change 904046 had a related patch set uploaded (by Simone Cuomo; author: Simone Cuomo):

[mediawiki/extensions/SearchVue@master] SearchPreview: Fix logic for snippets when cirrusDoc field is in array form

https://gerrit.wikimedia.org/r/904046

Change 904046 merged by jenkins-bot:

[mediawiki/extensions/SearchVue@master] SearchPreview: Fix logic for snippets when cirrusDoc field is in array form

https://gerrit.wikimedia.org/r/904046

Hei @Etonkovidova thank you for finding this!!

I have created a patch for this to be fixed (so that it does not break in production), but I am going to investigate further as for some reason the snippets is not there in the code and we cannot generate the expanded text for the example shown.

Thank you for fixing it! It works - the image is displayed; the Console error is not present.

Checked the pilot wikis - all look as expected. Checked for references, special notations, and other lang scripts display - there were no discrepancy between how it's displayed in articles snippets and in quick search view.