Pywikibot fails to parse tags correctly when they include whitespace
Closed, ResolvedPublic

Description

In this diff, archivebot.py said 2 threads were archived, while it was in fact 10. Looking into why, I found that the revision contained

</nowiki >

Whitespace after the tag name, while uncommon, is valid XML (https://www.w3.org/TR/REC-xml/#sec-starttags) and HTML (https://html.spec.whatwg.org/multipage/syntax.html#start-tags), but pywikibot failed to parse it.

Restricted Application added subscribers: pywikibot-bugs-list, jeblad, Aklapper. · View Herald TranscriptApr 5 2018, 7:43 PM

Change 424385 had a related patch set uploaded (by Danmichaelo; owner: Danmichaelo):
[pywikibot/core@master] Allow whitespace at end of html tags

https://gerrit.wikimedia.org/r/424385

Xqt triaged this task as High priority.Apr 6 2018, 7:50 AM
Xqt closed this task as Resolved.
Xqt assigned this task to Danmichaelo.

Change 424385 merged by jenkins-bot:
[pywikibot/core@master] Allow whitespace at end of html tags

https://gerrit.wikimedia.org/r/424385