Build out QA for parsing leads
**ToDo**
- [ ] unit test with set of handpicked articles, incl edge cases from existing solutions (in progress see o.a. [[ o.a. https://phabricator.wikimedia.org/T326805 | T326805 ]])
- [ ] unit tests existing solutions
- [ ] (automated) batch test with random set of articles
+ figure out how to present it (csv, v2 PoC, manual vs automated)
for manual testing it is handy to have abstract next to summary endpoint as well as a link to full wikipedia article for comparison
Some options to assemble a random set of articles:
- listen to the stream for 15 min and pull those articles (in v2 PoC?)
- use certain number of random titles from [[ https://dumps.wikimedia.org/other/ | title dump ]] for specific projects
- use certain templates or categories
-- e.g. to test what should result empty, this category https://en.wikipedia.org/wiki/Category:Pages_missing_lead_section and template might be useful https://en.wikipedia.org/wiki/Template:Lead_missing (note: the leads aren't always missing, sometimes they are just deemed too short)
- use lists like https://en.wikipedia.org/wiki/Wikipedia:Vital_articles
- use lists from other teams