Page MenuHomePhabricator

TextExtracts exception on very long repetitive content
Closed, ResolvedPublic1 Estimated Story Points


Here's an example query that failed:


    "servedby": "mw1282",
    "error": {
        "code": "internal_api_error_Exception",
        "info": "[V9MPcApAAE0AAc-HGa4AAAAN] Exception Caught: TextExtracts\\ExtractFormatter::getFirstSentences() error compiling regular expression /^(.+?(?:[^\\p{Lu}]\\.(?:[ \\n]|$)|[\\!\\?](?:[ \\n]|$)|\u3002|\uff0e|\uff01|\uff1f|\uff61)+){1,5}/u"

Event Timeline

Restricted Application added a subscriber: Aklapper. ยท View Herald TranscriptSep 9 2016, 7:39 PM

@MaxSem I believe you're most familiar with this code; any insights?

Change 331742 had a related patch set uploaded (by MaxSem):
getFirstSentences(): don't use crazy regexes

Change 331742 merged by jenkins-bot:
getFirstSentences(): don't use crazy regexes

I copied @Pchelolo's test page to the Beta Cluster under User:Phuedx-test-2/T145231 and requested an extract with the following URL:

That the extract is a monstrosity is reflective of @Pchelolo's monstrous test case ๐Ÿ˜„ ๐Ÿ‘

phuedx set the point value for this task to 1.Jan 27 2017, 10:17 AM

^ 1 point for the review/testing on the Beta Cluster.