Here's a list of issues found after comparing summary extract_html fields from 1.2.0 to 1.3.0 (MCS commit b0be98c). So far the wikis ar through es on the comparison report have been checked.
- scribunto-error as first paragraph selected for one article in bgwiki. We should consider removing span.scribunto-error from the DOM before selecting the intro paragraph. --> fixed in rGMOA050f9c2e122a
- We should also consider not going into any subsections, in the rare corner case where the lead section has a subsection. (This is rare but the new Parsoid section elements can be nested.): bn:শিক্ষা (fixed in https://bn.wikipedia.org/api/rest_v1/page/mobile-html/%E0%A6%B6%E0%A6%BF%E0%A6%95%E0%A7%8D%E0%A6%B7%E0%A6%BE/3400557)
Should be fixed on the Parsoid level or onwiki
- Infobox syntax shown one article on bnwiki
- books.google.com/books?isbn=0810864908 shown in one arwiki article
Recommend to be fixed onwiki
Moved most to T188134.
- Should be a list after the first paragraph: ca:Anarquisme
(I think the lead paragraph in this article is too long, and the two bullet items too large to fit into a summary. I don't have the Catalan skills to rewrite this so I'm going to punt on this.)
- title missing: de:Stuttgart (The de:Audio template uses the noprint class)
- would be nice to also include the paragraph immediately following the first when the end of the first paragraph ends with : These have some extra content or formatting HTML in between, though. So, best to fix onwiki: cs:Ohmův_zákon, cs:Archimédův_zákon, cs:Příslovce, da:Idealgasligning
Minor issues, some of which should probably be fixed in MCS
Most of these could also be fixed onwiki, see T188134.
- too many punctuation/whitespace characters (usually a result of stripping parentheticals or IPAs)
- double commas (,,): da:Blasfemi should probably be fixed onwiki
- comma before semicolon (,;): don't see it anymore with rGMOA1ee857e18296
- double spaces ( ): en:London, fixed in https://gerrit.wikimedia.org/r/c/414023/, another fix is
- space before comma ( ,): es:Grecia -> T220250
- space before semicolon ( ;): es:Célula
- consider not stripping ()
- from paragraphs after the first paragraph (e.g. in <li> elements): en:Suit (The latest version of that article is tough to handle since the list comes in a new section) , da:Kulstofkredsløb (this one is fixed now)
- with contents in bold (<b>foo</b>): el:Ευρωμπάσκετ, es:Número_áureo; but [HOLD OFF FOR NOW] since we probably want to get rid of all the IPAs in Azerbaijan.
- Consider including next paragraph if first paragraph ends with a : es:Resistencia_eléctrica -> T220249