Page MenuHomePhabricator

Add HTML5 <aside> to the parser whitelist
Open, LowPublic

Description

Test example here https://en.wikisource.org/wiki/User:ShakespeareFan00/Sandbox#Formatting_Side-titles, The aside is not seemingly being parsed. The reason I found this issue, was that i was trying to sandbox and work on a solution to how to do consistent sidetitles in a more consistent manner than the rather cumbersome kludge of template I had been using to this point.

Event Timeline

ShakespeareFan00 raised the priority of this task from to Needs Triage.
ShakespeareFan00 updated the task description. (Show Details)
ShakespeareFan00 subscribed.

The HTML5 recommendation says:

The aside element represents a section of a page that consists of content that is tangentially related to the content around the aside element, and which could be considered separate from that content. Such sections are often represented as sidebars in printed typography.

The element can be used for typographical effects like pull quotes or sidebars, for advertising, for groups of nav elements, and for other content that is considered separate from the main content of the page.

Note: It's not appropriate to use the aside element just for parentheticals, since those are part of the main flow of the document.

So it's not really the thing to use for side-titles (not that using elements for other than their original semantic purpose is particular foreign to MediaWiki!) but they're probably the thing to use for sidenotes (c.f. Wikisource:Template:Sidenotes begin).

Regardless, I think it'd be great to at least render them! Skins can make up their mind what to do with them, then.

It's not a full solution, but MediaWiki since 1.27 allows <div role="complementary"> and <div role="note"> for part of the functionality of an <aside>.

Do you have more information about the role attribute? My normal sources on HTML5 don't mention it.

See the WAI-ARIA 1.1 role definitions to get started.

An element's role corresponds to its semantic meaning — how a human understands the element's place in the document structure. Most of the time, elements won't have a role attribute, because the element's default role is the intended semantics. For example, <li>'s default role is listitem, so <li> used as an item in a list should have no role attribute. The role attribute, if present, overrides the element's default role — <table role="presentation"> is common for layout tables — or provides semantics to a semantically void element, such as <div>.

That makes the role attribute useful when I want an HTML5 semantic element like <aside>, but have to worry about old JavaScript-disabled browsers. Specifically, IE8 with JavaScript disabled will not display an <aside> element or any of its contents.

So I use a <div> with the role attribute to substitute for HTML5 semantic elements. First, I look at the element's default role, then I use that for the <div>'s role attribute. In <aside>'s case, <aside>'s default role is complementary. So, where I would normally use <aside> but also need no-JS IE8 support, I'd instead use <div role="complementary">. Screen readers should announce and navigate <div role="complementary"> identically to <aside>; I don't know any screen readers that don't.

If I were going to use a non-default role on the element, for example <aside role="note"> for a "see also" hatnote, I just use <div> with the same role attribute: <div role="note">.

Thanks. So it maybe a case of coding in the case of Wikisource - CSS styles for suitable spans and coding the role appropriately. ...

Volker_E subscribed.

In current task description there's not a clearly defined, rightful problem case for enabling aside in content. From this perspective, I don't see a clear reason to enable it in wikitext, outside of supporting it in skin templates.

Izno reopened this task as Open.EditedDec 10 2020, 2:46 AM
Izno subscribed.

pull quotes or sidebars, ... for groups of nav elements

We have each of those in Wikipedia and I can point to each of them. I do not see a reason why this was declined.

Izno renamed this task from HTML5 Semantic element <aside></aside> not seemingly parsed. to Add HTML5 <aside> to the parser whitelist.Dec 10 2020, 2:47 AM

I think <aside> like <section> is arguably part of the skin / meta-layout, not part of the article content. I've added lots of HTML5 elements to the whitelist, but I'd lean towards declining this one for now -- wikitext doesn't have a good page layout mechanism (although there are phab tasks for this, eg T90914: Provide semantic wiki-configurable styles for media display). It seems like a future page layout mechanism might want to generate <aside> itself, which would be complicated if we allowed wikitext to contain those tags directly.

This is similar to the reason we don't allow <figure> in wikitext -- wikitext already has a mechanism for generating and styling media, and we reserve <figure> for the output of those mechanisms.

The example listed at the opening of this , is no longer in the Sandbox linked (and I did check past revisions, if an admin wants to check deleted content feel free), so this should be closed unless someone wants to come up with a new example.

I am inclined to agree with cscott, , wikitext should be the mechanism for content, and as such direct HTML in wikitext ( other than classed DIV and SPAN inside templates, or modules). should be avoided, especially for content pages.

The reason I was looking at ASIDE in 2016 was to try and get sidetitles and marginal citations not to overlap. Subsequently to this I've learnt more and there are some contributors at English Wikisource looking into a different solution.

(One of the problem with the current approach at English Wikisource is the use of position:absolute for sidenotes, which means that the sidenotes can't flow in the same way that floated content would. For short marginal citations that works, for lengthy citations it doesn't.)

In terms of sidetitles, sidenotes and Marginal cites... I think a better approach might be to implement it as a "position: left/right/float" on a ref tag, and let the back end hand the layout issues (including collapsing the sidenotes back down to conventional footnotes on certain devices or narrow screens etc.)
However that would be a new ticket entirely..

Here's a strawman proposal, just to wrap up the discussion for the moment: we have a float and size mechanism for media, which uses <figure>. I'd be interested in thinking about how we might add 'text' as a different sort of 'media'. You could imagine syntax like: {{Text:/Foo|aside|left}} (which maybe would include text from PageName/Foo) which would set the proper wrapper tag (<aside>), role, and styling.

That said, I'm not a huge fan of the current media syntax, but if you can imagine a better way to specify how images and page layout interact, then you can "without loss of generality" imagine that that same mechanism could be used for text inclusions as well as images.

You mean add layout container options for LST ? your text syntax above is essentially {{#section: Pagename/Foo}}

(Aside: See also the discussion elsewhere about how REF tags are wrapped.)

(Aside: You can't currentl;y do LST from the same page, As I understand it Mediawiki warns about a potential loop.)

We're probably getting to the same place from different directions: you're adding the media options to LST, I'm adding LST-like transclusion abilities to media. But yeah, that's the basic idea one way or the other. Key point is to specify the semantics rather than just add HTML tags onto the whitelist.

I think <aside> like <section> is arguably part of the skin / meta-layout, not part of the article content.

I don't see it. I have real use cases that meet the description of aside. Why should I be barred from its use? :)

I also see use for section and article: see the Wikipedia main page or any portal for the former, which may not be marked up with header wikitext, meaning the parser would need to guess at where to put new sections (which is what it needs to do today and why Parsoid only guesses at sections for h2 wikitext). Based on the examples of article provided on MDN and WHATWG, the talk page consultation could be using that for each comment posted (pending some finalization of markup which will come sooner or later). (And I am still annoyed about New Vector declining to use article for article content, but that's neither here not there. ;) (Article could *also* be used on the main page/portals.)

I've added lots of HTML5 elements to the whitelist, but I'd lean towards declining this one for now -- wikitext doesn't have a good page layout mechanism (although there are phab tasks for this, eg T90914: Provide semantic wiki-configurable styles for media display).

The layout of the element is irrelevant; why are you talking about that? The point of semantic elements is to wrap things that match their semantic value. I'm not here to bandy where content displays but what it means. And I have wiki content that matches what it means. So I should be able to mark it up as such.

(I am concerned you are overly focused on the name of the element for some reason.)

It seems like a future page layout mechanism might want to generate <aside> itself, which would be complicated if we allowed wikitext to contain those tags directly.

Pages, extensions, and other PHP systems manage to sort out basic HTML elements already allowed, so, you're going to need to be more descriptive about why aside, of all tags, will be "complicated".

This is similar to the reason we don't allow <figure> in wikitext -- wikitext already has a mechanism for generating and styling media, and we reserve <figure> for the output of those mechanisms.

I don't think anyone has made the direct case for figure yet, not because wikitext "already has a mechanism". I can think of at least two templates that should use figure but do not today; in fact the reason I showed up to this task was because I was looking to see if any movement was made on figure of late (of course not).

My general opinion is that all HTML elements should be whitelisted barring existing conflicts (section particularly today) or security concerns. Especially the ones which have use cases.

At risk of straying off topic, I agree with Izno, use cases for <figure> aren't hard to find. There's a <div role="figure"> on en main page.

I've also seen non-media use cases for <figure>. The cladogram at https://en.wikipedia.org/wiki/Reptile#Phylogeny is semantically a figure, but contains a mixture of text and images.

Adding <aside> can be useful semantically without any special rendering or changes to the page layout mechanism - templates can choose to implement it, providing broad usage. I'd argue the vast majority of aside-appropriate content is probably rendered via templates.

My usage for <aside> is to make search engines not present a template's content as excerpt in the results, because most of our pages starts with a certain template. I patched MediaWiki to allow it and it works well (except VisualEditor doesn't recognize it and presents the HTML tag instead).

Do you know if search engines recognize <div role="whatever I should put here">?

Google snippets can include role="note" content. For example, you can see Google snippets for Wikipedia articles include <div role="note">See also...</div>, if you add "see also" to the search string.

Alternatively, the data-nosnippet attribute should block an element from Google snippets. Something like <aside role="note" data-nosnippet>See also...</aside> could be reasonable.

Google supports data-nosnippet, but not Bing, and no evidence of support for Baidu (our second largest referring search engine). <aside> seems to work well with both Google and Baidu.

This bug should be retitled if the actual problem to be solved is "hide template content from search engine snippets".

Feel free to create a separate task for that purpose. I think the aside tag is relevant irregardless of any ability to hide content from search engines (or style content with skins), as it is important semantically.

Change 948593 had a related patch set uploaded (by Winston Sung; author: Winston Sung):

[mediawiki/core@master] Sanitizer: Update to HTML 5 and allow some elements

https://gerrit.wikimedia.org/r/948593

Baidu (our second largest referring search engine)

That seems very counter-intuitive. I cannot get any Wikipedia results on Baidu, probably because WP is banned in China.

I cannot get any Wikipedia results on Baidu, probably because WP is banned in China.

Yes, that's the reason. You can't see results from any fully banned sites on Baidu.

Why would we consider how Baidu indexes us then?

Because there are lots of other sites using MediaWiki.

Alternatively, the data-nosnippet attribute should block an element from Google snippets. Something like <aside role="note" data-nosnippet>See also...</aside> could be reasonable.

Would <div role="note" data-nosnippet>See also...</div> work just as well for this?

(I think the discussion about getting Baidu to show or not show certain things is probably off topic for this task.)