== Background
In {T113094}, many issues were identified with previews not rendering appropriate text for articles. The majority of these issues are identified below. Any issues not identified here will be pushed to a future iteration.
== Acceptance criteria
**Mathematical expressions**: All mathematical expressions must be rendered as they appear in the original article, including subscripts and formulae (https://phabricator.wikimedia.org/T141766 and https://phabricator.wikimedia.org/T112137)
**Bolding**: Bolding will appear as within the article (https://phabricator.wikimedia.org/T141651)
**Parentheticals**: Parentheticals will be stripped (https://phabricator.wikimedia.org/T91344). Note: while there are edge cases which do not make sense for this, we will look at this separately for the second iteration
**<noinclude>**: Previews must not display <noinclude> content (more info here: https://phabricator.wikimedia.org/T109869 )
**lists**: If the article’s first paragraph contains a list, the list will be presented in the summary (bulleted lists will be presented as bullets, numbered lists as numbers) more info here: https://phabricator.wikimedia.org/T59850, {T156369}
Note well that for the above edge cases, the generic preview may be used if a more efficient solution cannot be identified
* All tasks associated with a particular feature are closed once that feature is implemented.
** This should usually go without saying but let's take the time to comb the Page Previews and TextExtracts backlogs for duplicate tasks.
== Proposal
The proposal has the following goals:
* Keep the `extracts` API query module as generic as possible.
* Contain Page Previews knowledge within its extension (see above).
* Support third-party MediaWiki installations.
** Unless the PO/TL decide we're not going to do this and let folk
We create a query module within the Page Previews extension, `pp_extract`, which can produce an extract that satisfies the requirements above. It:
* Will defer to `ExtractFormatter`, provided by the TextExtracts extension, to do extract selection and content filtering.
** i.e. it'll filter out everything that doesn't match AC #1, #2, #4, and #5.
* Will not accept any parameters.
* Will only operate on one page – the Page Previews API request is only ever for one page.
Tying this into RestBase should be as trivial as changing `'extracts'` to `'pp_extract'` and removing the `ex`-prefix query parameters.
=== Plan (YMMV)
* Extract `ApiQueryExtracts#getFirstSection( $text, $isPlainText )` to `TextExtracts\Extractor#getFirstSection( $html )`.
** It might be worth extracting `ExtractFormatter::getFirstChars` and `::getFirstSentences` too…
* Create PagePreviews\HtmlElementFilter`, which uses `ExtractFormatter` with a the HTML element whitelist that satisfies the AC above.
* Create PagePreviews\ParentheticalFilter`, which satisfies AC3.
* Create PagePreviews\ApiQueryPPExtract`, which ties the above into the API
**TODO**: There might be a piece missing about where we get the HTML content from, `PagePreviews\ExtractSource` maybe?