cosmetic_changes.resolveHtmlEntities() replacemens should be excluded within <pre> tag as well as inside <source>, <syntaxhighlight> and <nowiki>. Currently only <code> is excluded. See also this request
Description
Details
Subject | Repo | Branch | Lines +/- | |
---|---|---|---|---|
[bugfix] Avoid HTML entity substitution in <syntaxhighlight> | pywikibot/core | master | +10 -3 |
Related Objects
Event Timeline
Strange because the cosmetic_changes.resolveHtmlEntities() calls html2unicode(text, ignore=ignore, exceptions=['code']) where the exception list contains 'code' and ampersand is in the ignore list.
The screenshot does not demonstrate what replacements bots do but how MediaWiki treats HTML entities inside these tags.
I believe 'code' was a mistake and should have originally been 'source'.
@matej_suchanek Could you try a different character and post results? Ampersand might be an exception on both sides (both PWB and MW)
Yeah, numerical work the same, just tested. Okay, that means only syntaxhighlight needs to have exception
Remember that the following code is sometimes used: &nbsp; in <code> or similar tags, to intentionally display an HTML entity code .
This is sometimes used in documentation in code to copy, or to explain behavior.
The bot should not modify in this case.
This is avoided by blacklisting & and others such as > or < from replacement.
Change 609297 had a related patch set uploaded (by Matěj Suchánek; owner: Matěj Suchánek):
[pywikibot/core@master] [bugfix] Avoid HTML entity substitution in <syntaxhighlight>
Well, I have already demonstrated how MediaWiki behaves. If you take a look at the diffs in that task description, you will see the bot also replaced & -> &. That is certainly unwanted and there is a regression test which guards against this. But <code>...</code> does not escape HTML entity, so there is no point in excluding it (unless we want to give users false sense of security).
Change 609297 merged by jenkins-bot:
[pywikibot/core@master] [bugfix] Avoid HTML entity substitution in <syntaxhighlight>