Page MenuHomePhabricator

Output HTML should not contain `<p><style>...</style></p>`
Closed, ResolvedPublic

Description

Filing in response to discussion at https://www.mediawiki.org/wiki/Topic:U7c45mq2js1912mq.

When encountering <style>...</style> by itself on a line, as might be produced from TemplateStyles, the parser produces

<p><style>...</style>
</p>

The <p> tags are unnecessary, and may result in unwanted display depending on what styles are being applied to <p> tags.

Currently our configuration of HTML Tidy treats <style> as a block-level element, which causes it to remove wrapping <p> tags in the situation above. But RemexHtml doesn't remove the <p> tags, presumably since HTML5 allows <style> to be either flow or phrasing.

Parsoid's output also contains these <p> tags; I don't know whether it's getting this from the PHP Parser or doing it itself.

On the other hand, our current configuration of HTML Tidy will turn <p>foo <style>...</style> bar</p> into <p>foo</p><style>...</style><p>bar</p>, which probably isn't good either. But since HTML Tidy is on its way out, that's probably not worth worrying about.

Desired results:

Input half-parsed wikitext ($text before doBlockLevels)
<style>...</style>

foo <style>...</style> bar
Output (after tidying)
<style>...</style>
<p>foo <style>...</style> bar</p>

<style>...</style>foo bar as a standalone paragraph could be validly output in either of two ways, <style>...</style><p>foo bar</p> or <p><style>...</style>foo bar</p>. And similarly for foo bar<style>...</style>.

If BlockLevelPass avoids either starting or ending a paragraph for a line beginning with <style, similar to how it doesn't start a paragraph for block-level tags, it looks like RemexHtml will do the right thing.

Or RemexHtml could remove the <p> tags when the only content is one or more <style> tags, although that seems less likely to be appropriate since HTML5 doesn't specify that.

Event Timeline

Change 414836 had a related patch set uploaded (by Anomie; owner: Anomie):
[mediawiki/core@master] Parser: Don't wrap <style> or <link> tags in paragraphs

https://gerrit.wikimedia.org/r/414836

ggellerman subscribed.

@Tgr has agreed to review this. Thanks, Gergo!

Change 414836 merged by jenkins-bot:
[mediawiki/core@master] Parser: Don't wrap <style> or <link> tags in paragraphs

https://gerrit.wikimedia.org/r/414836

Per @ssastry's comment on the patch, Parsoid will need a matching change.

Additionally, the TemplateStyles parser tests need to be updated, they're currently failing.

Change 416329 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/extensions/TemplateStyles@master] Fix <p> wrappers in unit tests

https://gerrit.wikimedia.org/r/416329

Change 416329 merged by jenkins-bot:
[mediawiki/extensions/TemplateStyles@master] Fix <p> wrappers in unit tests

https://gerrit.wikimedia.org/r/416329

Are there any successes in solving this problem?

It's fixed in the PHP Parser, but probably still needs fixing in Parsoid.

Also, when using HTML Tidy, tidy will split paragraphs if the tag is in the middle of one. Since Tidy is in the process of being replaced by RemexHTML, we're not going to worry about that. Remex handles it correctly.

@ssastry Can you prioritize this on the Parsoid end per @Deskana? Thanks!

I just want to make sure I understand this correctly. When I first complained about this, it was because this edit failed. This still fails, so what you have done instead is allowing the templatestyles tag to be placed on its own line. However, there are still cases when the line break is significant and unwanted, as can be seen in https://sv.wikipedia.org/wiki/Mall:Audio, so always adding it on its own line does not seem to be an option either.

Basically, what I'm getting at is that I'd like clear instructions on either https://www.mediawiki.org/wiki/Extension:TemplateStyles or https://www.mediawiki.org/wiki/Help:TemplateStyles about where exactly the templatestyles tag should be, so that even idiots like me know how to insert them. If my testing and understanding is correct, such a rule could be:

If the template starts with {|, ignoring any preceding HTML comments (<!---->), add the templatestyles tag on its own line before the template content. Otherwise, add the templatestyles tag immediately before the template content, on the same line.

Is this correct?

You can find that in the documentation for tables: table markup must be at the start of the line, just like list or preformatted markup. TemplateStyles is no different from any other wikimarkup in that regard.

I don't see where your Mall:Audio template breaks if a linebreak is placed after the <templatestyles> tag; the article A looks fine if I use TemplateSandbox to preview it with the added linebreak.

But it is potentially the case that a template designed to be used inline might somehow be sensitive to it. In the end on-wiki editors will have to determine what works best in any particular case beyond wikitext requirements like the ones @Tgr pointed out.

@ssastry Per T186965#4075784, I think all that's remaining here is for the equivalent fix to be applied in Parsoid. This isn't urgent, but a fix soon would be appreciated. Let me know if I can help. Thanks!

Change 452952 had a related patch set uploaded (by Subramanya Sastry; owner: Subramanya Sastry):
[mediawiki/services/parsoid@master] WIP: Treat template-styles <style> tag specially wrt p-wrapping

https://gerrit.wikimedia.org/r/452952

Sorry .. I shouldn't have untagged those other projects since the original bug report was for both parsers.

Change 465460 had a related patch set uploaded (by C. Scott Ananian; owner: C. Scott Ananian):
[mediawiki/services/parsoid@master] Strip <style> tags in TOC

https://gerrit.wikimedia.org/r/465460

Change 452952 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Treat template-styles <style> tag specially wrt p-wrapping

https://gerrit.wikimedia.org/r/452952

Change 465460 merged by jenkins-bot:
[mediawiki/services/parsoid@master] Strip <style> / <script> tags in TOC

https://gerrit.wikimedia.org/r/465460

Has the Parsoid patch that removes the p tags around style tags been deployed, and if so, when was it deployed? I'm asking because I'm still seeing style tags wrapped by p tags on for example https://sv.wikipedia.org/api/rest_v1/page/html/Anv%C3%A4ndare%3ANirmos%2FAnders_Bj%C3%B6rck