Page MenuHomePhabricator

Extracts for Wikimedia List articles display partial previews
Closed, DeclinedPublic

Description

Text extracts and the page_summary service should not return a summary extract or extract_html on list pages like https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland

Steps to reproduce

  1. Go to: https://en.wikipedia.beta.wmflabs.org/wiki/Marko_Saaresto
  2. Hover over “Finnish”

Observed: preview ends with the first paragraph


Expected: generic preview appears

NOTE: generic previews should appear for all extracts ending in a ":"

Possible solutions

  • Do not show previews for any page in a certain category
  • Do not show previews for any page in a certain wikidata type
  • Introduce way for editors to say this page does not have a page summary
  • Wrap first paragraphs in noexerts class

Related Objects

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJun 19 2017, 5:51 PM
ovasileva added a subscriber: phuedx.
ovasileva moved this task from Backlog to Next Up on the Page-Previews board.Jun 20 2017, 1:03 PM

This is trival change I decided to fix it ASAP

Change 360368 had a related patch set uploaded (by Pmiazga; owner: Pmiazga):
[mediawiki/extensions/Popups@master] If extract ends with ':' treat it as generic extract

https://gerrit.wikimedia.org/r/360368

pmiazga claimed this task.Jun 20 2017, 3:54 PM
pmiazga moved this task from Next Up to In Development on the Page-Previews board.
Jhernandez updated the task description. (Show Details)Jun 20 2017, 4:25 PM
Jdlrobson renamed this task from Extracts for lists display partial previews to Extracts for Wikimedia List articles display partial previews.Jun 20 2017, 4:30 PM
Jdlrobson updated the task description. (Show Details)
Jdlrobson added a subscriber: Jdlrobson.

We talked about this in prioritisation. There's quite a few possible ways to approach doing this but we agreed we should not scan for ":" (because i18n) and we should do this in a generic way.

Change 360368 abandoned by Pmiazga:
If extract ends with ':' treat it as generic extract

Reason:
We want to do it in the backend, check task comments.

https://gerrit.wikimedia.org/r/360368

Jdlrobson updated the task description. (Show Details)Jun 20 2017, 7:02 PM

@phuedx - is there an example of how to exclude previews from summaries?

@phuedx - is there an example of how to exclude previews from summaries?

Err… I don't understand what you're asking. Do you mean excluding content from extracts? If so, then here's a bunch of templates on enwiki that use the noexcerpt class, which TextExtracts will strip: https://en.wikipedia.org/w/index.php?search=insource%3Anoexcerpt&title=Special:Search&profile=advanced&fulltext=1&ns10=1&ns11=1

@phuedx - I'm sorry, that comment made no sense at all. I was trying to ask if it's possible to build a template or a tag similar to noexcerpt that would instead display the generic preview.

Similarly, I was wondering if we could do the opposite. For example, if we fix this by making all pages belonging in category: disambiguation pages get the generic previews, would it be possible to create a template that manually adds the regular preview. As in, user hovers over https://en.wikipedia.org/wiki/1st_parallel and notices that only two items are showing on the disambiguation page, decides that a preview would be useful, and edits the article to include it.

This list page is a counter-example: https://en.m.wikipedia.org/wiki/List_of_Doctor_Who_serials
There's an argument to be made that the summary of https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland is bad. "The following is a list of bands from Finland:" might become "This article lists bands that come from Finland." It makes a lot of assumptions about how the content is going to be consumed.

@Jdlrobson - this is why I prefered the " : " solution versus displaying the generic per categories. Although, I'm beginning to think we should just make a way for editors to fix this when these edge cases arise.

Although, I'm beginning to think we should just make a way for editors to fix this when these edge cases arise.

HURRAH!
I do think this is the right and most scalable approach here. Searching for ":" makes me cry a little inside. It's going to create more unexpected bugs. I would put money on it.

Jdlrobson changed the task status from Open to Stalled.Jun 22 2017, 5:55 PM

The more I look into this, the more I see the List of bands from Finland as an edge case.

Look at https://en.wikipedia.org/wiki/List_of_animals_with_fraudulent_diplomas and https://en.wikipedia.org/wiki/List_of_highest-grossing_Indian_films - both list pages but well written summaries.

With my dev hat on, I think it would be much more valuable to engage community liasons to make editors aware of this problem on wiki and fixed (e.g. clicking edit on https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland and improve).

Stripping content after ":" is guaranteed to create unexpected bugs elsewhere and increase the complexities of our service.

Can I suggest we set some time to talk about this when things are a little quieter (e.g. HTML endpoint is better defined).

Jdlrobson closed this task as Declined.Jul 13 2017, 6:40 PM

This is not the job of TextExtracts. It is doing exactly what it should - taking the text in the lead and spitting something out.
The previews can be improved by editing https://en.wikipedia.beta.wmflabs.org/wiki/List_of_bands_from_Finland