Page MenuHomePhabricator

Summary truncated after 3 chars ("Dr.")
Closed, DuplicatePublic

Description

In https://en.wikipedia.org/api/rest_v1/page/summary/Frederick_Chilton the returned extracts are:

"extract":"Dr.","extract_html":"<p><b>Dr."

Event Timeline

Esanders created this task.Jan 17 2018, 8:37 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 17 2018, 8:37 PM

Thanks for the report. The issue is with TextExtracts Reading infrastructure are currently in the process of deploying a rewrite of the summary logic.

Using that I see this is fixed with the following extract:

"extract": "Dr. Frederick Chilton is a fictional character appearing in Thomas Harris' novels Red Dragon and The Silence of the Lambs.",
"extract_html": "<p><b>Dr. Frederick Chilton</b> is a fictional character appearing in Thomas Harris' novels <i>Red Dragon</i> and <i>The Silence of the Lambs</i>.</p>"
}

I've merged this into the epic task which mentions this problem.

The fundamental difference that solves this problem is that we now use the first paragraph (HTML p tag) (the intro) rather than the TextExtracts approach of trying to count and pick out sentences in the preview by regex.