Page MenuHomePhabricator

Hovercards sometimes has contents in brackets (parentheses) appearing in excerpts, especially at ruwiki
Closed, ResolvedPublic

Description

Tpimh reports

In English wiki bulbs show the beginning of the article with stripped text in brackets (e.g. dates of birth and death, alternative names), but in Russian wiki it is shown. Sometimes it is the only information that is shown if the text in brackets is long enough. Kind of not usefull at all.

This seems to be a prolific problem at Ruwiki, (screenshots from [[Main page]] links to these articles: Луций Сергий Катилина, and Гай Саллюстий Крисп, and Этрурия.)

Screenshot_from_2015-03-18_14:16:46.png (1×1 px, 590 KB)

Screenshot_from_2015-03-18_14:16:59.png (1×1 px, 607 KB)

Screenshot_from_2015-03-18_14:16:39.png (1×1 px, 636 KB)

I've only been able to find one example at Enwiki (out of ~100 tests) linking to this article https://en.wikipedia.org/w/index.php?title=Samuel_Allyne_Otis&oldid=523235627

Screenshot_from_2015-03-18_14:18:54.png (845×1 px, 220 KB)

and none at Frwiki,


Note: There is a plan to refine what content is excluded, in T91344: Review exclude all approach to parenthetical elements in summary endpoint, but for the moment No bracketed contented is meant to be shown.

Event Timeline

Quiddity assigned this task to Prtksxna.
Quiddity raised the priority of this task from to Needs Triage.
Quiddity updated the task description. (Show Details)
Quiddity added a project: Page-Previews.
Quiddity subscribed.

In the case of Samuel Allyne Otis, TextExtracts returns the following excerpt —

Samuel Allyne Otis (son of James Otis, Sr., father of Harrison Gray Otis and brother of prominent revolutionary James Otis, Jr.

And when Hovercards finds malformed brackets, in this case — just an opening bracket, it doesn't do anything to the text.

The same seems to be the case with the links on Russian Wikipedia

So, as long as there are malformed brackets in the extract, Hovercards will show them as is.

In our API call if we increase exsenteces to 5 and get rid of exintro we'll be able to get a better TextExtract. We are clipping the extra content on other cards anyway, so this won't cause a problem there.

For example, we could get —

Кантата (итал. cantata, от лат. саntare — петь) — вокально-инструментальное произведение, созданное для солистов и хора.

…instead of just —

Кантата (итал. cantata, от лат.


@MaxSem Would this be alright for our use case?
@ori, would this have any performance implications?

@MaxSem Would this be alright for our use case?

Yes.

@ori, would this have any performance implications?

No.

Change 202001 had a related patch set uploaded (by Prtksxna):
renderer.article: Remove exintro and increase exsentences to 5 in the API call

https://gerrit.wikimedia.org/r/202001

@MaxSem, I noticed something strange with the use of exintro.

exintroresultlink
falseКантата (итал. cantata, от лат.API Call
trueКантата (итал. cantata, от лат.API Call
not setКантата (итал. cantata, от лат. саntare — петь) — вокально-инструментальное произведение, созданное для солистов и хора.API Call

This is why I had removed exintro in resources/ext.popups.renderer.article.js.

@Prtksxna, it seems that exintro is considered true whenever it's set. Just leave it out to get the default value (false) to appear.

@Prtksxna, it seems that exintro is considered true whenever it's set. Just leave it out to get the default value (false) to appear.

Right. That is what I am doing in the patch. @MaxSem points out that we might end up getting the first section heading in Hovercards in case the intro isn't long enough.

Change 202001 merged by jenkins-bot:
renderer.article: Increase exsentences to 5 in the API call

https://gerrit.wikimedia.org/r/202001