Here's a list of issues found after comparing summary `extract_html` fields from 1.2.0 to 1.3.0 (MCS commit b0be98c). So far the wikis ar through es on the [[ http://wpsummary.surge.sh/1.2.0-b0be98c/html/es.html | comparison report ]] have been checked.
=== Issues
[x] `scribunto-error` as first paragraph selected for [[ https://bg.wikipedia.org/api/rest_v1/page/html/%D0%92%D1%82%D0%BE%D1%80%D0%B0_%D1%81%D0%B2%D0%B5%D1%82%D0%BE%D0%B2%D0%BD%D0%B0_%D0%B2%D0%BE%D0%B9%D0%BD%D0%B0 | one article in bgwiki ]]. We should consider removing `span.scribunto-error` from the DOM before selecting the intro paragraph. --> fixed in rGMOA050f9c2e122a
[] We should also consider not going into any subsections, in the rare corner case where the lead section has a subsection. (This is rare but the new Parsoid `section` elements can be nested.): [[ https://bn.wikipedia.org/api/rest_v1/page/html/%E0%A6%B6%E0%A6%BF%E0%A6%95%E0%A7%8D%E0%A6%B7%E0%A6%BE/2758222 | bn:শিক্ষা ]]
=== Should be fixed on the Parsoid level or onwiki
[] Infobox syntax shown [[ https://bn.wikipedia.org/api/rest_v1/page/html/আলী_ইবনে_আবু_তালিব/2837813 | one article on bnwiki ]]
[] `books.google.com/books?isbn=0810864908` shown in [[ https://ar.wikipedia.org/api/rest_v1/page/html/%D8%A3%D9%85%D8%A7%D8%B2%D9%8A%D8%BA/26896137 | one arwiki article ]]
=== Recommend to be fixed onwiki
Moved most to T188134.
[] Should be a list after the first paragraph: [[ https://ca.wikipedia.org/api/rest_v1/page/html/Anarquisme/19283010 | ca:Anarquisme ]]
=== Minor issues, some of which should probably be fixed in MCS
Most of these could also be fixed onwiki, see T188134.
[] too many punctuation/whitespace characters (usually a result of stripping parentheticals or IPAs)
[] double commas (`,,`): [[ https://en.wikipedia.org/api/rest_v1/page/html/Paul_McCartney/822867780 | en:Paul_McCartney ]] 'Sir James Paul McCartney,, is an' is actually a bad example since there should be no comma directly after his name anyways (we should get more use of noexcerpt class instead), [[ https://da.wikipedia.org/api/rest_v1/page/html/Blasfemi/9104429 | da:Blasfemi ]], [[ https://es.wikipedia.org/api/rest_v1/page/html/Will_Smith/105231673 | es:Will_Smith ]], [[ https://es-] double commas (`,,`): [[ https://da.wikipedia.org/api/rest_v1/page/html/Mar%C3%ADa_I_de_Escocia/105110472 | es:María_I_de_Escocia ]]Blasfemi/9104429 | da:Blasfemi ]] should probably be fixed onwiki
[-] comma before semicolon (`,;`): don't see it anymore with 1ee857e
[-] double spaces (` `): [[ https://en.wikipedia.org/api/rest_v1/page/html/London/822677492 | en:London ]], fixed in https://gerrit.wikimedia.org/r/c/414023/, another fix is
[] space before comma (` ,`): [[ https://da.wikipedia.org/api/rest_v1/page/html/Prins_Joachim/9341511 | da:Prins_Joachim ]], [[ https://es.wikipedia.org/api/rest_v1/page/html/Cristiano_Ronaldo/105191364 | es:Cristiano_Ronaldo ]], [[ https://es.wikipedia.org/api/rest_v1/page/html/Grecia/105177728 | es:Grecia ]]
[] space before semicolon (` ;`): [[ https://es.wikipedia.org/api/rest_v1/page/html/C%C3%A9lula/105100890 | es:Célula ]]
[] consider not stripping `()`
[] from paragraphs after the first paragraph (e.g. in `<li>` elements): [[ https://en.wikipedia.org/api/rest_v1/page/html/Suit/822913926 | en:Suit ]], [[ https://da.wikipedia.org/api/rest_v1/page/html/Kulstofkredsl%C3%B8b/9351225 | da:Kulstofkredsløb ]]
[-] with contents in bold `(<b>foo</b>)`: [[ https://el.wikipedia.org/api/rest_v1/page/html/%CE%95%CF%85%CF%81%CF%89%CE%BC%CF%80%CE%AC%CF%83%CE%BA%CE%B5%CF%84_1987/6542146 | el:Ευρωμπάσκετ ]], [[ https://es.wikipedia.org/api/rest_v1/page/html/N%C3%BAmero_%C3%A1ureo/105154269 | es:Número_áureo ]]; but [HOLD OFF FOR NOW] since we probably want to get rid of all the IPAs in [[ https://en.wikipedia.org/api/rest_v1/page/html/Azerbaijan/827080382 | Azerbaijan ]].
[] with contents in `("")`: [[ https://da.wikipedia.org/api/rest_v1/page/html/Syre/9236379 | da:Syre ]]
[] picked wrong paragraph: [[ https://es.wikipedia.org/api/rest_v1/page/html/Cambio_clim%C3%A1tico/105201387 | es:Cambio_climático ]]
[] title missing: [[ https://de.wikipedia.org/api/rest_v1/page/html/Stuttgart/173449180 | de:Stuttgart ]]
=== Nice to haves
[] would be nice to also include the paragraph immediately following the first when the end of the first paragraph ends with `:`: [[ https://cs.wikipedia.org/api/rest_v1/page/html/Ohm%C5%AFv_z%C3%A1kon/15567838 | cs:Ohmův_zákon ]], [[ https://cs.wikipedia.org/api/rest_v1/page/html/Archimédův_zákon/15693096 | cs:Archimédův_zákon ]], [[ https://cs.wikipedia.org/api/rest_v1/page/html/Příslovce/15397768 | cs:Příslovce ]], [[ https://da.wikipedia.org/api/rest_v1/page/html/Idealgasligning/8516662 | da:Idealgasligning ]], [[ https://es.wikipedia.org/api/rest_v1/page/html/Resistencia_el%C3%A9ctrica/103807716 | es:Resistencia_eléctrica ]]