Page MenuHomePhabricator

Template "lang" badly processed
Closed, ResolvedPublic

Description

I often see that the text in the card stops on the template lang-xx. In ru-wiki "{ {lang-la|Cancer}}" generates a string "лат. Cancer" (лат. as lat. - latin). It looks like in some cases, simply do not read after ".".

For example: https://ru.wikipedia.org/wiki/%D0%97%D0%BE%D0%B4%D0%B8%D0%B0%D0%BA%D0%B0%D0%BB%D1%8C%D0%BD%D1%8B%D0%B5_%D1%81%D0%BE%D0%B7%D0%B2%D0%B5%D0%B7%D0%B4%D0%B8%D1%8F

'''Рак''' ({{lang-la|''Cancer''}}) — самое неприметное [[зодиакальное созвездие]], которое можно увидеть лишь в ясную ночь между созвездиями [[Лев (созвездие)|Льва]] и [[Близнецы (созвездие)|Близнецов]]. Самая яркая звезда ([[Бета Рака|β Рака]]) имеет [[видимая звёздная величина|видимую звёздную величину]] 3,53<sup>m</sup>.

'''Лев''' ({{lang-la|''Leo''}}) — [[зодиакальное созвездие]] северного полушария неба, лежащее между [[Рак (созвездие)|Раком]] и [[Дева (созвездие)|Девой]].

Event Timeline

Sunpriat created this task.Oct 18 2015, 7:33 PM
Sunpriat raised the priority of this task from to Needs Triage.
Sunpriat updated the task description. (Show Details)
Sunpriat added a project: Page-Previews.
Sunpriat added a subscriber: Sunpriat.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptOct 18 2015, 7:33 PM
Prtksxna set Security to None.
Jdlrobson triaged this task as Low priority.Oct 26 2015, 5:15 PM
Jdlrobson added a subscriber: Jdlrobson.
putnik added a subscriber: putnik.Mar 28 2016, 5:21 PM

The problem is that parameter "exsentences" don't works correctly when {{lang-xx}} templates are used. I will try to fix it this week.

Change 280096 had a related patch set uploaded (by Putnik):
Fix separation of text into sentences.

https://gerrit.wikimedia.org/r/280096

I added the patch. Now sentences will only be separated by common space character (« ») or line end character (\n), but not by &nbsp;, &thinsp; or other whitespace characters, which are often used inside a sentence.

Change 280096 merged by jenkins-bot:
Fix separation of text into sentences.

https://gerrit.wikimedia.org/r/280096

TheDJ closed this task as Resolved.Apr 8 2016, 4:21 PM
TheDJ assigned this task to putnik.
putnik reopened this task as Open.Apr 8 2016, 4:32 PM
putnik added a comment.Apr 8 2016, 4:41 PM

Sorry, but it turned out that in the plain text version non-breaking spaces are replaced by spaces, and so the problem is still remained.
This replacing is done on purpose. I'll see tonight is it possible to remove it or not. If everything is OK, then I'll commit changes.

adding this as a blocker to rolling hovercards out on smaller wiki, because on russian wikipedia the community discussion was overwhelmingly positive, but the only bug folks mentioned was this one:

discussion

summary by @SSneg

There are still a problem with the fact that the first sentence ends after the abbreviation with a dot.
But HoverCard requests 5 sentences, so this task is probably no longer blocks T132602 after patch https://gerrit.wikimedia.org/r/#/c/282640/

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 5:11 PM

@putnik Thanks a lot for your contributions! Note that this isn't going to be deployed this week due to the branch freeze because of the dallas switchover, so we can't verify it in production.

Can you add instructions to reproduce in beta cluster?

@dr0ptp4kt @JKatzWMF What would be the status of this bug? Is the previous patch enough to unblock the rollout? (fix the lang-xx template problem?).

I've been reading the task but it's complicated to know the state of the task because of language and no clear reproducible steps in the beta cluster.

I've created http://en.wikipedia.beta.wmflabs.org/wiki/TextExtracts-lang-template-bug-hoverit for testing it, seems to work fine as experienced here: i.imgur.com/DOABEHC.png.

Source for the linked test page:

Test test. Test asdf test {{lang-la|''Leo''}} test after lang template test.

Is my test correct, can we resolve this task if so?

@Sunpriat Is that broken? How should it look like?

I'd appreciate clarification please :)

Sunpriat added a comment.EditedApr 27 2016, 4:53 PM




In this case, it seems because sign":"

'''Аякс (Эант)''' ({{lang-grc|Αἴᾱς}}) — имя двух греческих героев, участвовавших в осаде [[Троя|Трои]]:
* [[Аякс Малый]], или Оилид — сын [[Оилей|Оилея]].
* [[Аякс Великий]], или Теламонид — сын [[Теламон (мифология)|Теламон]]а.

Ллойд, Гарольд | Беккет, Сэмюэл | Франклин, Арета

'''Гарольд Клейтон Ллойд''' ({{lang-en|Harold Clayton Lloyd}}; {{Дата рождения|20|4|1893}} — {{Дата смерти|8|3|1971}}) — американский актёр и кинорежиссёр, известен своими немыми комедиями.

Бениньи, Роберто

'''Робе́рто Бени́ньи''' ({{lang-it|Roberto Benigni}}, род. {{ДатаРождения|27|10|1952}}, [[Кастильон-Фьорентино]], [[Ареццо (провинция)|Ареццо]], [[Тоскана]]) — [[Италия|итальянский]] [[актёр]], [[режиссёр]], [[сценарист]] и [[продюсер]], лауреат премий «[[Оскар (кинопремия)|Оскар]]» и ''[[BAFTA]]''.

Ли, Брюс --> ";"

'''Брюс Ли''' ({{lang-en|Bruce Lee}}); [[Китайские имена|детское имя]] — '''Ли Сяолун''' ({{lang-zh|李小龙}}, {{lang-en|Li Xiao Long}}, {{lang-ru|Маленький Дракон}}), взрослое имя — '''Ли Чжэньфань''' ({{lang-zh|李振藩}}, {{lang-en|Lee Jun Fan}}); [[27 ноября]] [[1940 год|1940]], [[Сан-Франциско]] — [[20 июля]] [[1973 год|1973]], [[Гонконг]]) — популяризатор и реформатор в области китайских боевых искусств, [[гонконг]]ский и [[США|американский]] [[киноактёр]], [[режиссёр]], [[сценарист]], [[продюсер]], постановщик боевых сцен и [[философ]].

Фидий

'''Фидий''' ({{lang-el|Φειδίας}}, ок. [[490 до н. э.]] — ок. [[430 до н. э.]]) — [[Древняя Греция|древнегреческий]] [[Скульптура|скульптор]] и [[архитектор]], один из величайших художников периода высокой классики. Друг [[Перикл]]а.

Цюй Юань

'''Цюй Юань''' ({{Китайский||屈原|Qū Yuán}}, второе имя '''Цюй Пин''' кит. 屈平), ок. 340—278 до н. э. — первый известный лирический поэт в истории Китая [[Период Сражающихся царств|эпохи Воюющих Царств]]. Его образ стал одним из символов [[патриотизм]]а в [[Культура Китая|китайской культуре]].

Вийон, Франсуа

<small>Также см. [[Фийон, Франсуа|Франсуа Фийон (политик)]]</small>
'''Франсуа́ Вийо́н''' ({{lang-fr|François Villon}}), настоящая фамилия — де Монкорбье́ ({{lang-fr2|de Montcorbier}}), Монкорбье ({{lang-fr2|Montcorbier}}) или де Лож ({{lang-fr2|des Loges}}); родился между [[1 апреля]] [[1431 год|1431]] и [[19 апреля]] [[1432 год|1432]] в [[Париж]]е; год и место смерти неизвестны (после [[1463 год|1463]], но не позднее [[1491 год|1491]]). [[Поэт]] [[Франция|французского]] [[Средние века|Средневековья]]. Первый французский лирик позднего Средневековья<ref>Б.Байер, У. Бирштайн и др. История человечества 2002 ISBN 5-17-012785-5</ref>.

Уильямс, Уильям Карлос

'''Уильям Карлос Уильямс''' ({{lang-en|William Carlos Williams}}, [[17 сентября]] [[1883]], Резерфорд, [[Нью-Джерси]] – [[4 марта]] [[1963]], там же) – один из крупнейших поэтов США.

Сократ

'''Сокра́т''' ({{lang-grc|Σωκράτης}}; 470/[[469 г. до н. э.]], [[Древние Афины|Афины]] — [[399 г. до н. э.]], там же) — древнегреческий [[философ]], учение которого знаменует поворот в философии — от рассмотрения природы и мира к рассмотрению человека. Его деятельность — поворотный момент античной философии. Своим методом анализа понятий ([[майевтика]], [[диалектика]]) и отождествлением положительных качеств человека с его знаниями он направил внимание философов на важное значение человеческой личности. Сократа называют первым философом в собственном смысле этого слова. В лице Сократа философствующее мышление впервые обращается к себе самому, исследуя собственные принципы и приёмы.

Ролз, Джон

'''Джон Ролз''' ({{lang-en|John Bordley Rawls}}; [[21 февраля]] [[1921]], [[Балтимор]] — [[24 ноября]] [[2002]]) — [[США|американский]] философ, основоположник либерально-государственной концепции внутреннего и международного права, в значительной степени лежащей в основе {{прояснить|современной}} политики [[США]].

Sunpriat updated the task description. (Show Details)Apr 27 2016, 6:35 PM
Sunpriat updated the task description. (Show Details)Apr 28 2016, 4:46 AM

Seems like there are still problems with it.

@putnik do you want to keep working on it or do you want me to ping others to have a look at the bug?

In http://en.wikipedia.beta.wmflabs.org/wiki/TextExtracts-lang-template-bug-hoverit links TextExtracts-lang-template-bug2 and TextExtracts-lang-template-bug3 seem to be broken, stopping at the dot, and as @Sunpriat mentions there are other issues too that seem very related.

Jhernandez raised the priority of this task from Low to Needs Triage.May 11 2016, 4:12 PM

I'm putting this back in our triage queue to re-check its priority, since this is blocking the rollout to smaller wiki & AB test.

Jhernandez triaged this task as High priority.May 11 2016, 5:22 PM
Jhernandez moved this task from 2015-16 Q4 to 2016-17 Q2 on the Readers-Web-Backlog board.

We're going to add a spike for finding out proper sentence detection libraries and improving the hovercards implementation.

Tgr added a subscriber: Tgr.May 11 2016, 6:48 PM

Duplicate of T59669?

As noted there, a decent solution for this problem is unlikely to be possible in PHP or client-side JS.

phuedx added a comment.EditedMay 12 2016, 8:33 AM

Duplicate of T59669?

Yes. I think we should merge this task into it.

Edit

Objections?

Another possible option could be changing the usage of text extracts to limit per char length, and show ellipsis at the end of the hovercard.

That would be a feature change though. Depending on the spike output we'll need to talk about this with product and design.

@Tgr T59669 is definitely related, but that one and this one are really examples of what would be Improve TextExtracts sentence detection or deprecate. Do you think we should create a new task, summarize all the examples and merge these two into that one, or edit one of these to become the generic bug and merge one into the other?

Tgr added a comment.May 12 2016, 10:34 PM

@Tgr T59669 is definitely related, but that one and this one are really examples of what would be Improve TextExtracts sentence detection or deprecate. Do you think we should create a new task, summarize all the examples and merge these two into that one, or edit one of these to become the generic bug and merge one into the other?

I would go with the second option but either way is fine.

Another possible option could be changing the usage of text extracts to limit per char length, and show ellipsis at the end of the hovercard.

@Jhernandez @Nirzar I think that from a user perspective, having an elipsis is very helpful, even if we end on a sentence is essential for letting the user know that there is more to be read on the article. I don't see any problem with doing a character count if we can end on a space...cutting off midword isn't so hot, but T67845 highlights the current issues with that.

As discussed in the spike T135020, we're going to proceed with T135824 shortly to fix this bug by not relying on sentence detection because of the edge cases, like the iOS app already does, but using a char limit instead.

Hopefully that will resolve the bugs and edge cases on this bug report. Going to add it as a blocker for this one.

Jdlrobson closed this task as Resolved.Jun 1 2016, 9:42 PM

We believe this should now be resolved thanks to fixing the blocking tasks. This change will be rolled out to all wikis late Thursday, so if you see this problem from Friday 3rd June onwards please do reopen.

I've verified the description usecases on ru.wiki and they seem to work fine as far as I can tell. If there's any other issues please open back or create a new task!

Thanks everybody.