Page MenuHomePhabricator

Summary endpoint cannot handle brackets with whitespace inside
Closed, ResolvedPublic

Description

Seen on https://en.wikipedia.org/wiki/Baton_Rouge,_Louisiana when hovering over Louisiana

Screen Shot 2018-06-26 at 2.27.26 PM.png (481×442 px, 131 KB)

API request: https://en.wikipedia.org/api/rest_v1/page/summary/Louisiana

<p><b>Louisiana</b> (, ) is a state in

Another example - hover over Sythian at bottom of https://en.wikipedia.org/wiki/Cannabis

Screen Shot 2018-06-26 at 2.31.03 PM.png (460×393 px, 224 KB)

API request: https://en.wikipedia.org/api/rest_v1/page/summary/Scythian_languages

Note we currently do not scrub single words without whitespace e.g. (IPA)
MY guess is we are not considering the whitespace inside (, ) and ( or ). We also shouldn't consider ',' a character for this purpose.

Developer notes

Given pages can change (and have in the above examples), this can be replicated locally using http://0.0.0.0:6927/en.wikipedia.org/v1/page/summary/Louisiana/847421868 and http://0.0.0.0:6927/en.wikipedia.org/v1/page/summary/Scythian_languages/842536896

Failing test cases (without fix) here:
https://gerrit.wikimedia.org/r/#/c/mediawiki/services/mobileapps/+/451892

Event Timeline

Jhernandez claimed this task.
Jhernandez triaged this task as High priority.

I've checked those urls and they seem to behave well now. 👍

bearND added a subscriber: bearND.

We didn't change any summary code. The changes were made on-wiki, wrapping the extra strings with <span class="noexcerpt">:

Not sure if that really counts as resolved.

Jhernandez lowered the priority of this task from High to Medium.Aug 10 2018, 3:28 PM

Cool, thanks for checking!

I'm leaving it as normal for now, we'll get to it.

Change 451874 had a related patch set uploaded (by BearND; owner: BearND):
[mediawiki/services/mobileapps@master] Dev: add more summary test pages

https://gerrit.wikimedia.org/r/451874

Added these pages to our list of summary test pages.

Change 451874 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Dev: add more summary test pages

https://gerrit.wikimedia.org/r/451874

Jdlrobson updated the task description. (Show Details)

Change 451892 had a related patch set uploaded (by Jdlrobson; owner: Jdlrobson):
[mediawiki/services/mobileapps@master] Add some new test cases for parenthetical stripping

https://gerrit.wikimedia.org/r/451892

TBH the test pages above won't help much and seem redundant tome. Pages are constantly changing and this was unrelated to templates. If you want a test case I'd recommend using the beta cluster and using the text it was at the time of the problem:

The '''Scythian languages''' ({{IPAc-en|ˈ|s|ɪ|θ|i|ə|n}} or {{IPAc-en|ˈ|s|ɪ|ð|i|ə|n}}) are a group of [[Eastern Iranian languages]] of the [[classical antiquity|classical]] and [[late antiquity|late]] antiquity ([[Middle Iranian languages|Middle Iranian]]) [[periodization|period]], spoken in a vast region of [[Eurasia]] named [[Scythia]]. Except for modern [[Ossetian language|Ossetian]], which descends from the [[Alanian]] variety, these languages are all considered to be extinct. Modern Eastern Iranian languages such as [[Wakhi language|Wakhi]], however, are related to the eastern [[Saka language|Scytho-Khotanese]] dialects attested from the kingdoms of [[Kingdom of Khotan|Khotan]] and [[Tumxuk|Tumshuq]] in the ancient [[Tarim Basin]], in present-day southern [[Xinjiang]], [[China]].

I can still replicate this issue in the MCS, so it's probably still happening (as we're dealing with user created content here), albeit it's hard to know where. I've added a developer note to the ticket to show how this can be replicated locally and added some tests to show where the problem is.

TBH the test pages above won't help much and seem redundant tome. Pages are constantly changing

That's why I also specified the revision for these pages where the issue still occurs. This is just a list we're compiling for problematic pages for the summary, so we can quickly see with the compareExtracts.js script when there are regressions when changing the summary code. Of course having a unit test is better. So, thank you for adding that.

Got it. My bad, I misunderstood what those tests were doing. Thanks for the clarification.

I keep hitting this one in the wild, a constant reminder this bug is still open. Here's a recent example I saw today:

Screen Shot 2018-09-19 at 4.19.15 PM.png (546×468 px, 102 KB)

It would be great to address this sooner rather than later.

Change 461617 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Summary: Tweak paren-stripping regex to handle a couple more edge cases

https://gerrit.wikimedia.org/r/461617

Change 461617 merged by Mholloway:
[mediawiki/services/mobileapps@master] Summary: Tweak paren-stripping regex to handle a couple more edge cases

https://gerrit.wikimedia.org/r/461617

Change 451892 abandoned by Jdlrobson:
Add some new test cases for parenthetical stripping

Reason:
im no longer working on the summary endpoint.

https://gerrit.wikimedia.org/r/451892