Page MenuHomePhabricator

Extracts for Wikimedia List articles display partial previews
Closed, DeclinedPublic

Description

Text extracts and the page_summary service should not return a summary extract or extract_html on list pages like https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland

Steps to reproduce

  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Marko_Saaresto
  2. Hover over “Finnish”

Observed: preview ends with the first paragraph

Screen Shot 2017-06-20 at 12.01.49 PM.png (243×404 px, 51 KB)

Expected: generic preview appears

NOTE: generic previews should appear for all extracts ending in a ":"

Possible solutions

  • Do not show previews for any page in a certain category
  • Do not show previews for any page in a certain wikidata type
  • Introduce way for editors to say this page does not have a page summary
  • Wrap first paragraphs in noexerts class

Related Objects

StatusSubtypeAssignedTask
ResolvedNone
ResolvedNone
ResolvedJhernandez
Resolved Mholloway
ResolvedDereckson
ResolvedJdlrobson
Resolvedovasileva
Resolvedovasileva
ResolvedJdlrobson
DuplicateNone
DuplicateNone
Resolvedovasileva
DeclinedJdlrobson
ResolvedJdlrobson
Resolvedovasileva
Resolved Fjalapeno
Resolvedphuedx
Declined pmiazga

Event Timeline

This is trival change I decided to fix it ASAP

Change 360368 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] If extract ends with ':' treat it as generic extract

https://gerrit.wikimedia.org/r/360368

Jdlrobson renamed this task from Extracts for lists display partial previews to Extracts for Wikimedia List articles display partial previews.Jun 20 2017, 4:30 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a subscriber: Jdlrobson.

We talked about this in prioritisation. There's quite a few possible ways to approach doing this but we agreed we should not scan for ":" (because i18n) and we should do this in a generic way.

Change 360368 abandoned by Pmiazga:
If extract ends with ':' treat it as generic extract

Reason:
We want to do it in the backend, check task comments.

https://gerrit.wikimedia.org/r/360368

@phuedx - is there an example of how to exclude previews from summaries?

@phuedx - is there an example of how to exclude previews from summaries?

Err… I don't understand what you're asking. Do you mean excluding content from extracts? If so, then here's a bunch of templates on enwiki that use the noexcerpt class, which TextExtracts will strip: https://en.wikipedia.org/w/index.php?search=insource%3Anoexcerpt&title=Special:Search&profile=advanced&fulltext=1&ns10=1&ns11=1

@phuedx - I'm sorry, that comment made no sense at all. I was trying to ask if it's possible to build a template or a tag similar to noexcerpt that would instead display the generic preview.

Similarly, I was wondering if we could do the opposite. For example, if we fix this by making all pages belonging in category: disambiguation pages get the generic previews, would it be possible to create a template that manually adds the regular preview. As in, user hovers over https://en.wikipedia.org/wiki/1st_parallel and notices that only two items are showing on the disambiguation page, decides that a preview would be useful, and edits the article to include it.

This list page is a counter-example: https://en.m.wikipedia.org/wiki/List_of_Doctor_Who_serials
There's an argument to be made that the summary of https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland is bad. "The following is a list of bands from Finland:" might become "This article lists bands that come from Finland." It makes a lot of assumptions about how the content is going to be consumed.

@Jdlrobson - this is why I prefered the " : " solution versus displaying the generic per categories. Although, I'm beginning to think we should just make a way for editors to fix this when these edge cases arise.

Although, I'm beginning to think we should just make a way for editors to fix this when these edge cases arise.

HURRAH!
I do think this is the right and most scalable approach here. Searching for ":" makes me cry a little inside. It's going to create more unexpected bugs. I would put money on it.

Jdlrobson changed the task status from Open to Stalled.Jun 22 2017, 5:55 PM

The more I look into this, the more I see the List of bands from Finland as an edge case.

Look at https://en.wikipedia.org/wiki/List_of_animals_with_fraudulent_diplomas and https://en.wikipedia.org/wiki/List_of_highest-grossing_Indian_films - both list pages but well written summaries.

With my dev hat on, I think it would be much more valuable to engage community liasons to make editors aware of this problem on wiki and fixed (e.g. clicking edit on https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland and improve).

Stripping content after ":" is guaranteed to create unexpected bugs elsewhere and increase the complexities of our service.

Can I suggest we set some time to talk about this when things are a little quieter (e.g. HTML endpoint is better defined).

This is not the job of TextExtracts. It is doing exactly what it should - taking the text in the lead and spitting something out.
The previews can be improved by editing https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland