Page MenuHomePhabricator

[Bug] Page summaries should not strip the normalized title from the extract?
Open, MediumPublic

Description

Parenthetical stripping discussions have come up before but this may be a special case. When the normalized title contains parentheses and that entire title appears in the extract, it could be preserved instead of stripped. I don't know if this should be a bug, optimization, or working as intended.

This could also be an opportunity to revisit parenthetical stripping behavior. Given that page previews has been enabled by default in production for some time now, editors may already be tweaking lead paragraphs for optimal previews and so parenthetical processing could be toned down or even removed.

Steps to reproduce

  1. Visit https://en.wikipedia.org/wiki/List_of_one-hit_wonders_in_the_United_States.
  2. Hover over the "Brandy (You're a Fine Girl)" link.

Expected results

The content is unstructured but parentheticals stripped could exclude the normalized title.

Actual results

Environments observed

  • Browser version: Chromium v75.0.3770.90
  • OS version: Ubuntu v19.04
  • Device model: Desktop
  • Device language: English

Check any additional observations

Page summary API response

Response
// https://en.wikipedia.org/api/rest_v1/page/summary/Brandy_(You're_a_Fine_Girl)

{
  "type": "standard",
  "title": "Brandy (You're a Fine Girl)",
  "displaytitle": "Brandy (You're a Fine Girl)",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "wikibase_item": "Q4957221",
  "titles": {
    "canonical": "Brandy_(You're_a_Fine_Girl)",
    "normalized": "Brandy (You're a Fine Girl)",
    "display": "Brandy (You're a Fine Girl)"
  },
  "pageid": 6744625,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/en/c/cf/Brandy_-_Looking_Glass.jpg",
    "width": 315,
    "height": 315
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/en/c/cf/Brandy_-_Looking_Glass.jpg",
    "width": 315,
    "height": 315
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "896234623",
  "tid": "4ac7b600-7216-11e9-b32a-c62d8f42e7a5",
  "timestamp": "2019-05-09T04:52:43Z",
  "description": "1972 pop song",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Brandy_(You're_a_Fine_Girl)",
      "revisions": "https://en.wikipedia.org/wiki/Brandy_(You're_a_Fine_Girl)?action=history",
      "edit": "https://en.wikipedia.org/wiki/Brandy_(You're_a_Fine_Girl)?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Brandy_(You're_a_Fine_Girl)"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Brandy_(You're_a_Fine_Girl)",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Brandy_(You're_a_Fine_Girl)",
      "edit": "https://en.m.wikipedia.org/wiki/Brandy_(You're_a_Fine_Girl)?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Brandy_(You're_a_Fine_Girl)"
    }
  },
  "api_urls": {
    "summary": "https://en.wikipedia.org/api/rest_v1/page/summary/Brandy_(You're_a_Fine_Girl)",
    "metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Brandy_(You're_a_Fine_Girl)",
    "references": "https://en.wikipedia.org/api/rest_v1/page/references/Brandy_(You're_a_Fine_Girl)",
    "media": "https://en.wikipedia.org/api/rest_v1/page/media/Brandy_(You're_a_Fine_Girl)",
    "edit_html": "https://en.wikipedia.org/api/rest_v1/page/html/Brandy_(You're_a_Fine_Girl)",
    "talk_page_html": "https://en.wikipedia.org/api/rest_v1/page/html/Talk:Brandy_(You're_a_Fine_Girl)"
  },
  "extract": "\"Brandy \" is a 1972 song written and composed by Elliot Lurie and recorded by Lurie's band, Looking Glass, on their debut album Looking Glass. The single reached number one on both the Billboard Hot 100 and Cash Box Top 100 charts, remaining in the top position for one week. It reached number two on the former chart for four weeks, stuck behind Gilbert O'Sullivan's \"Alone Again (Naturally)\", before reaching number one, only for \"Brandy\" to be dethroned by \"Alone Again (Naturally)\" the week after. Billboard ranked it as the 12th song of 1972. Horns and strings were arranged by Larry Fallon.",
  "extract_html": "<p>\"<b>Brandy </b>\" is a 1972 song written and composed by Elliot Lurie and recorded by Lurie's band, Looking Glass, on their debut album <i>Looking Glass.</i> The single reached number one on both the <span><i>Billboard</i> Hot 100</span> and <span><i>Cash Box</i> Top 100</span> charts, remaining in the top position for one week. It reached number two on the former chart for four weeks, stuck behind Gilbert O'Sullivan's \"Alone Again (Naturally)\", before reaching number one, only for \"Brandy\" to be dethroned by \"Alone Again (Naturally)\" the week after. <span><i>Billboard</i></span> ranked it as the 12th song of 1972. Horns and strings were arranged by Larry Fallon.</p>"
}

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

I remember this has been discussed in the past. This preview especially comes to mind:

At the time, we decided that the change would be too complex, but I'm not sure if we've changed things since then

LGoto triaged this task as Medium priority.Jun 26 2019, 3:39 PM
LGoto moved this task from Needs triage to Upcoming on the Product-Infrastructure-Team-Backlog board.
LGoto raised the priority of this task from Medium to Needs Triage.Jun 26 2019, 3:41 PM
JoeWalsh triaged this task as Medium priority.Jun 26 2019, 3:43 PM
bearND added a subscriber: bearND.Aug 14 2019, 4:12 PM

One idea, just to throw out there, would be to check if the title has any parentheses. If so we could just skip stripping any parentheses. I think that could be an easy check to make. The downside would be that the extract may have parentheses we might have removed otherwise.

Another idea is to replace the title with a tag/markup before performing any transformations and then re-add the title and remove the tag/markup, for example:

  1. get the html text
<p>"<b>Brandy (You're a Fine Girl)</b>" is a 1972 song written and composed by Elliot Lurie and recorded by Lurie's band, Looking Glass, on their debut album <i>Looking Glass.</i> [...]
  1. Replace title from html text before summarize transformation
<p>"<b><post-process-title></b>" is a 1972 song written and composed by Elliot Lurie and recorded by Lurie's band, Looking Glass, on their debut album <i>Looking Glass.</i> [...]
  1. Re-add title after summarize transformation replacing <post-process-title>
<p>"<b>Brandy (You're a Fine Girl)</b>" is a 1972 song written and composed by Elliot Lurie and recorded by Lurie's band, Looking Glass, on their debut album <i>Looking Glass.</i> [...]
LGoto raised the priority of this task from Medium to High.Aug 21 2019, 3:52 PM
LGoto moved this task from Upcoming to Backlog on the Product-Infrastructure-Team-Backlog board.

We're focused on other goals right now for Q1 but this is important and should be looked into.

We would be happy to review patches to the service if anyone from the web team has some cycles before we get to it, it may take some time.

LGoto lowered the priority of this task from High to Medium.Aug 28 2019, 3:54 PM
Jhernandez removed a subscriber: Jhernandez.Apr 2 2020, 6:46 PM
Restricted Application added a subscriber: Masumrezarock100. · View Herald TranscriptApr 2 2020, 6:46 PM