Page MenuHomePhabricator

Extract cropped too much at certain pages with exintro=1
Closed, DuplicatePublic

Description

Hello,
I'd like to report a bug I came accross while using the amazing TextExtracts extension: at certain pages, the extract is cropped too much, actually right after the very first full stop, when using exintro=1. An example of such page is here:

https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:API_p%C3%ADskovi%C5%A1t%C4%9B#action=query&format=json&prop=extracts&titles=Tom%C3%A1%C5%A1+Koubek&redirects=1&utf8=1&exsentences=3&exintro=1&explaintext=1

When you turn exintro off, everything is fine, but that's not the way that would help me because it doesn't get rid of the disambiguation note for example.

I also came accross a page where the extract is cropped regardless of the exintro setting, see

https://cs.wikipedia.org/wiki/Speci%C3%A1ln%C3%AD:API_p%C3%ADskovi%C5%A1t%C4%9B#action=query&format=json&prop=extracts&titles=Nikolaj+Bodurov&redirects=1&utf8=1&exsentences=3&exintro=1&explaintext=1

Can anyone let me know if this can be fixed? I didn't find anything strange in the source of the pages affected by this problem.

Event Timeline

Jdlrobson subscribed.

Thanks for the bug report. Yes, this is a known problem and is documented here: https://www.mediawiki.org/wiki/Extension:TextExtracts#Caveats and we do not plan to fix it.

We're working on a new endpoint to provide better extracts for these kinds of use cases - see T168848 . I suggest not using exsentences in the mean time.

Oh, amazing, I didn't know exsentences causes such problem. Turning it off works perfectly, sorry I didn't find it myself before… Thanks! :-)