Page MenuHomePhabricator

Hovercard text extract is broken for academic titles before and after names of person
Closed, DuplicatePublic

Description

Steps to reproduce

  1. Create a page with first sentence like this: [[Armádní generál|Arm. gen.]] [[Inženýr|Ing.]] '''Petr Pavel''', [[Master of Arts|M.A.]], (* [[1. listopad|1. listopadu]] [[1961]] [[Planá]]) je ... (or copy this page)
  2. Link to it from another article
  3. See its hovercard

Expected behavior
The whole first sentence should be shown in hovercard.

Current behavior
Only Arm. gen. Ing. (academical prefixes) is shown in a hovercard.

Event Timeline

Dvorapa created this task.Aug 19 2017, 10:36 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 19 2017, 10:36 PM
Jdlrobson added a subscriber: Jdlrobson.

Thanks for the bug report! We're working on a new API to improve our summaries (T113094) and I've added a test case to show this is fixed by it!:
https://gerrit.wikimedia.org/r/372875

Dvorapa reopened this task as Open.Aug 24 2017, 11:12 AM
Dvorapa updated the task description. (Show Details)

I doubt these two are duplicates. This is about academic name prefixes or suffixes, or numbers written with dot (.) in some European languages instead of English st nd th, or abbreviations written with dot too

I see what you mean. However, they are somewhat related....

The current way we obtain extracts is fundamentally flawed in that it uses the . or . character (or any end of sentence character we know about) to mean "end of sentence".
Code here: https://github.com/wikimedia/mediawiki-extensions-TextExtracts/blob/master/includes/ExtractFormatter.php#L81
We've seen numerous issues with this approach and this example is just one of them. Basically the conclusion is that exsentences has too many issues and should be considered broken.

As a result, we won't be using that in the new endpoint.

The API request currently being generated to use that text extract is:
https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:API_p%C3%ADskovi%C5%A1t%C4%9B#action=query&format=json&prop=extracts&titles=Petr+Pavel&exsentences=3

.Also notice the broken HTML (which is T168329 so why I say they are related).

https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:API_p%C3%ADskovi%C5%A1t%C4%9B#action=query&format=json&prop=extracts&titles=Petr+Pavel
doesn't have the same problem, but causes other issues elsewhere.

The way we plan to solve this, is to abandon the use of this API in favor of T168848 for the purpose of page previews

I'm not sure whether to decline this task or merge it into T168848. What would you prefer I do @Dvorapa ?

Vachovec1 added a comment.EditedAug 24 2017, 3:03 PM

@Jdlrobson: I am a little bit confused. Task T168848 is about Mobile(Apps) Content Service. I thought that new API would serve all requests, not only requests from mobile devices?

@Vachovec1 confusingly we're using the Mobile(Apps) content service for page previews :) The "Mobile" is in the name for historic reasons and just the service where this service lives. Now it is being used by page previews on desktop, clearly it's scope has changed!

Dvorapa closed this task as Declined.Aug 25 2017, 6:23 AM

@Jdlrobson thank you for your detailed explanation

Restricted Application added a subscriber: jeblad. · View Herald TranscriptAug 25 2017, 6:23 AM
Restricted Application added a subscriber: Zoranzoki21. · View Herald TranscriptMar 2 2018, 3:41 PM
Dvorapa moved this task from Backlog to Done on the Page-Previews board.Mar 2 2018, 3:41 PM

Finally solved by T182321