Page MenuHomePhabricator

Hovercards: Space before removed parentheses should also be removed in the extract
Closed, ResolvedPublic

Description

This commit removed parentheses from extracts:
https://gerrit.wikimedia.org/r/#/c/132663/

However, the space before the parentheses is not removed, which results in an incorrect space, often directly before punctuation.

Example: http://en.wikifur.com/wiki/Category:NordicFuzzCon_staff

For the article "Vappu", which starts:
"Vappu (born August 7), also known as Lexy, Alexis and Alexistar..."
The extract is:
"Vappu , also known as Lexy, Alexis and Alexistar..."
There should not be a space after "Vappu". I think you could do something like "(\s?)" before the rest of the regex to remove this.


Version: master
Severity: normal

Details

Reference
bz67225
Related Gerrit Patches:

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 3:31 AM
bzimport added a project: Page-Previews.
bzimport set Reference to bz67225.
bzimport added a subscriber: Unknown Object (MLST).

The following commit addressed this issue - https://gerrit.wikimedia.org/r/#/c/135759/

Have you upgraded and still facing this problem?

Yes, we are running on the current HEAD.

That commit removed parentheses and the material within them. It left the space immediately before the parentheses when the parentheses was followed by punctuation, such as a comma or full stop. (This was not handled by the regexp that was replaced, either.)

The line:
extract = extract.replace(/\s+/g, ' '); // Remove extra white spaces
will only replace two or more spaces with one. It does not remove the single space between "Vappu" and "," in the example above.

Understood. I'll add failing test cases for the example you mentioned and look into this. Thank you!

Cool! If you do make changes, it might also be worth revisiting the name of the function and the accompanying comment, since what is being removed is not what I'd call brackets - [] or {} - but parentheses - () - and also their content.

I know (...now, per https://en.wikipedia.org/wiki/Bracket ) that parentheses are technically a type of bracket, but in that case I would expect other brackets to be affected as well. Perhaps something like removeParenthesizedContent()?

(Another possibility is that other types of brackets should be removed, but that'd be an entirely separate bug.)

I know (...now, per https://en.wikipedia.org/wiki/Bracket ) that parentheses are technically a type of bracket, but in that case I would expect other brackets to be affected as well. Perhaps something like removeParenthesizedContent()?

This patch corrected the name of the function and some variable names too.

This patch corrected the name of the function and some variable names too.

Sorry! Forgot to add the link - https://gerrit.wikimedia.org/r/#/c/141661/

Jaredzimmerman-WMF triaged this task as Medium priority.Dec 3 2014, 12:32 AM
Jaredzimmerman-WMF set Security to None.
Prtksxna claimed this task.Jan 2 2015, 12:26 PM

Change 182450 had a related patch set uploaded (by Prtksxna):
renderer.article: Remove leading spaces before brackets

https://gerrit.wikimedia.org/r/182450

Patch-For-Review

Prtksxna closed this task as Resolved.Feb 28 2015, 5:04 AM
Prtksxna reopened this task as Open.Feb 28 2015, 5:33 AM

(change hasn't merged yet)

Change 182450 merged by jenkins-bot:
renderer.article: Remove leading spaces before brackets

https://gerrit.wikimedia.org/r/182450

Prtksxna closed this task as Resolved.Mar 5 2015, 11:08 AM
Quiddity moved this task from Backlog to Done on the Page-Previews board.Mar 28 2015, 1:16 AM