Author: Astronouth7303
Description:
By default, the abbrev and acronym tags are not included in the HTML
whitelist, and therefore will be displayed as tags.
Version: unspecified
Severity: enhancement
• bzimport | |
Oct 10 2004, 1:44 AM |
F1395: Sanitizer.php.diff | |
Nov 21 2014, 7:07 PM |
Author: Astronouth7303
Description:
By default, the abbrev and acronym tags are not included in the HTML
whitelist, and therefore will be displayed as tags.
Version: unspecified
Severity: enhancement
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | None | T25932 Allow use of semantic HTML5 elements in wikitext | |||
Resolved | None | T2671 Whitelist non-problematic HTML tags: address |
An alternative would be to encourage contributors to link to acronyms and
initialisms that aren't commonly used. For example (from w:Internet):
"On [[January 1]], [[1983]], the ARPANET changed its core networking protocols
from [[Network Control Program|NCP]] to [[Transmission Control
Protocol|TCP]]/[[Internet Protocol|IP]], marking the start of the Internet as we
know it today."
"NCP," "TCP," and "IP" are all linked to their full names. The acronym tag is
typically used when an acronym or initialism is first used; likewise when an
important word, phrase, or acronym is first used, it's linked to the appropriate
article.
This alternative could be put in the Wikipedia Manual of Style for now, until
this bug is resolved.
– [[vi:User:Mxn|Minh Nguyễn]]
Changing the summary from "abbrev and acronym tags are not on the whitelist" to
"abbr and acronym tags are not on the whitelist," since the HTML tag is abbr.
(In reply to comment #4)
We are not currently adding new HTML tags to the whitelist.
So how can we implement that feature ? Should we add a new settings
in DefaultSettings / LocalSettings such as $wgMoreAllowedHTMLTag ?
Astronouth7303 wrote:
(In reply to comment #5)
So how can we implement that feature ? Should we add a new settings
in DefaultSettings / LocalSettings such as $wgMoreAllowedHTMLTag ?
I think $wgMoreAllowedHTMLTag would be a good comprimise. It wouldn't be in the default
configuration, but individual webmasters can change it.
Probably the best way to implement this (correct me if I'm wrong) would be to insert
something in Parser::removeHTMLtags() (Parser.php, line 1899 in 1.3.8, line 2183 in
1.4.0). I can't say how to implement it exactly, but that's where the tags are, I think.
The alternative is, of course, making a plugin for it (which, all things considered,
wouldn't be dificult, either).
karmazilla wrote:
I just suck at submitting these bugreports.. here's the diff:
2299c2299
'ruby', 'rt' , 'rb' , 'rp', 'p', 'span', 'acronym', 'abbr'
karmazilla wrote:
here's the .diff fixing bug 671 - finaly ;)
attachment Parser.php.diff ignored as obsolete
suruena wrote:
I also would like to request the addition of the <abbr> tag because is a
[http://www.w3.org/TR/WCAG10-HTML-TECHS/#text-abbr WCAG requirement] for
accessible webs.
However, it's not a good idea to add the <acronym> tag because it will be
deprecated in XHTML 2.0
omniplex wrote:
From the "missing" elements <abbr> is interesting,
and maybe <q> could replace some templates and style
guidelines. Popular browsers are supposed to handle
nested quotations as the user likes it, with CSS
allowing different styles depending on the language.
ayg wrote:
Proposed patch
Elements added:
still in a *very* preliminary stage; it says this explicitly)
may as well allow it even if it should be excluded from articles for the time
being)
This patch would mean that MediaWiki permits all semantic elements recognized
in HTML 4. Other elements that might be considered on different grounds are
col and colgroup (bug 986).
attachment 671.patch ignored as obsolete
ayg wrote:
Proposed patch
Okay, I actually *tested* this one. It won't spontaneously combust if you use
attributes, and it no longer includes the bogus address element (which is a
valid element but not a simple semantic-phrase one, as it turns out). Works as
expected on my local test wiki: appropriate attributes work, evil ones are
eaten.
attachment 671b.patch ignored as obsolete
bhawkeslewis wrote:
With regards to <a href="#c1">comment #1</a>, what WCAG 1.0 says is that abbr
and acronym should be used on /at least/ the first occasion on a page. Given
that (as far as I know) no screen readers remember how an abbreviation or
acronym was expanded with title earlier on a page, it actually makes more sense
to mark up each and every occurrence. Mediawiki authors could add the expansion
on the first occurrence, and Mediawiki could add title to subsequent occurrences
automatically.
With regards to <q>, it's not entirely true that Internet Explorer does not
support it. It is placed in the DOM tree, it can be styled, and it can be used
to recognize a quotation by a screen reader such as Window-Eyes (JAWS fails this
though). There are several options for IE compatibility. First, Mediawiki could
accept input in the form of <q> and output quotation punctuation. Second,
Mediawiki could output <q> but style it italic in Internet Explorer using
conditional comments (but this won't help if CSS is disabled). Third, Mediawiki
could output conditional comments to show quotation punctuation to Internet
Explorer and <q> to other browsers.
ayg wrote:
I don't buy WCAG 4.2. It explains absolutely nothing to give "HTTP" a title of "hypertext
transfer protocol": either you know what that is, in which case you know what HTTP is, or you
don't, in which case the expanded form is basically gibberish too. There's no more or less need
to expand acronyms than to elucidate any other unfamiliar forms, and in both cases that's doable
using just text. It's an issue of clarity, not accessibility.
<q> could be implemented with server-side additions of quotes, yes, but it would still make the
wikitext less readable, which defeats the purpose of wikitext. I suspect we aren't going to want
to go with heuristic techniques to decide where quotes start or end, so I don't see many options
here.
(Yeah, my opinion on this has changed over the last several months.)
Astronouth7303 wrote:
IMHO, this bug addresses the fact that these tags are just formatting/structure
tags (like i, strong, em, code, tt, etc). There is no reason that they should
not be whitelisted in terms of security. IMHO, discussions about usability,
cleanness, reasons, etc. for using them should be elsewhere.
These same usability arguments could be made about using <i> (''italics'') or
<b> ('''bold'''). If I were to have a preferred syntax, I would suggest
{{abbr:Hypertext Transfer Protocol|HTTP}}. That is, however, MOO (My Own Opinion).
As far as the gibberish goes, this is somewhat true. Wikipedia articles also
have a tendency of going into great detail. So if you have little previous
knowledge about the subject, you're lost no matter what. But if you have
knowledge about related topics and want to find info on, say, HTCPCP, this goes
a long way explaining it. (Even a layman understands "Coffee Pot Control".)
The idea of the parser handling every instance of an abbreviation would mean
that the parser would be doing something that no other feature does. No other
feature works implicitly through a whole document. And nothing is said about
conflicts (talking about the National Basketball League in Australia and the
National Bicycle League in the US in the same article), instances before an
abbreviation (Talking about DUI before calling it the Democratic Union for
Integration), or other inconsistencies.
The enhancements on the <q> tag really belong in another bug, IMHO. "Smart"
processing based on user agent is also a dangerous road to follow. It doesn't
help caching, makes assumptions about the browser, and requires constant
updating. (Not to mention violates the intent of the User-Agent HTTP header.)
Maybe we should take a poll of anyone color blind, deaf, using a PDA/cell phone,
or blind that uses Wikipedia and see what they say about violating the WCAG.
(Make up a demo page and ask them about it.) All of those groups are covered in
the WCAG; "accessibility" refers to the ability to access information, no matter
what your platform, capabilities, or speed. Granted, Wikipedia is a lot better
than other sites (it seems odd to me that the corporations that make money from
websites often have the poorest ones), but these are factors that still should
be considered.
psychonaut wrote:
Proposed patch
I agree; there's no good reason why the harmless but (in some cases) extremely
useful HTML tags <kbd>, <samp>, etc. should not be allowed by default. I
attach a version of Simetrical's patch which has been updated to work with the
latest SVN repository version.
Attached:
giecrilj wrote:
(In reply to comment #14)
The syntax of Q is evolving in favour of the current IE support:
Quotation punctuation (such as quotation marks), if any, *must be placed inside* the q element.
http://www.w3.org/TR/html5/text-level.html#the-q
I would also remark that it is a shame not to have DFN particularly at Wikipedia.
With regards to <q>, it's not entirely true that Internet Explorer does not
support it. It is placed in the DOM tree, it can be styled, and it can be used
to recognize a quotation by a screen reader such as Window-Eyes (JAWS fails this
though). There are several options for IE compatibility. First, Mediawiki could
accept input in the form of <q> and output quotation punctuation. Second,
Mediawiki could output <q> but style it italic in Internet Explorer using
conditional comments (but this won't help if CSS is disabled). Third, Mediawiki
could output conditional comments to show quotation punctuation to Internet
Explorer and <q> to other browsers.
smccandlish wrote:
I agree; there's no good reason why the harmless but (in some cases) extremely
useful HTML tags <kbd>, <samp>, etc. should not be allowed by default. I
attach a version of Simetrical's patch which has been updated to work with the
latest SVN repository version.
Right. So, why more than a year and half later hasn't this simple patch been applied? These tags are an integral part of the semantic (not display) markup of [X]HTML. This needs to be fixed sooner rather than later. It is MORE important than MediaWiki support all of the semantic tags, from address to samp, than it support purely presentational crap like tt, b and i, all of which (while not yet deprecated by W3C) have been de facto deprecated by the entire Web development community for over a decade in favor of CSS styling, since they mix content and presentation. Writing computer-technical articles in Wikipedia is a maddening exercise because 2/3 of the basic code for doing so has been needlessly disabled by the software itself. ARGH...
ayg wrote:
(In reply to comment #22)
Right. So, why more than a year and half later hasn't this simple patch been
applied?
See comment #4. Brion Vibber is lead developer, you'll have to take it up with him. I'm sure not going to commit it without his okay, when he's explicitly said not to.
purely presentational crap like tt, b and i, all of which
(while not yet deprecated by W3C) have been de facto deprecated by the entire
Web development community for over a decade in favor of CSS styling
They have important uses. See bug 370, bug 1038, bug 7921 for why, in particular, we use <b> and <i> for ''' and '', not <strong> and <em>.
giecrilj wrote:
(In reply to comment #23)
(In reply to comment #22)
Right. So, why more than a year and half later hasn't this simple patch been
applied?See comment #4. Brion Vibber is lead developer, you'll have to
Comment #4 does not answer the question why.
smccandlish wrote:
(In reply to comments #23-25)
It was a rhetorical question, the implication of which is "this is simple stuff, it should just get fixed". These are basic, useful semantic elements.
(In reply to coment #23)
Yes, <i> and <b> have valid uses (book titles, italicization of foreign phrases, etc.) <tt> much less clearly so. I don't have any particular problem with '' and ''' being rendered as <i> and <b>, though I do wish that editors would use <em> and <strong> when they are using italics or bold for emphasis not style. Minor point.
smccandlish wrote:
I would leave <acronym> off the list, as it is not going to be supported in XHTML 2, HTML 5, etc. Someone commented above over two years ago that XHTML 2 was "VERY preliminary", but that is no longer true, and dropping of <acronym> didn't change. For Wikipedia purposes, I would strongly suggest that that the strings "<acronym*>" and "</acronym>" be parsed, but convert them into "<abbr*>" and "<abbr>" before the final product it sent to the user agent (HTML5 specifically states that <abbr> has absorbed <acronym>, so this will properly account for any attempts to use that dying element). The <address> element should simply be omitted, as its sole function is identifying contact information *of the author of the page or section in which appears*, which is not of any use on any WikiMedia projects; the element is frequently abused to mark up addresses in general, and we would not want MediaWiki to be an enabler of this misuse.
We should also be aware that <tt>, <s>/<strike>, and <u> are also going to be removed from XHTML 2 and HTML 5. The time is probably NOW to start weaning people off of them, though of course they shouldn't simply be deleted from MediaWiki support just yet. It would be good if these were replaced on-the-fly with styled <span>s, though (meanwhile, <i>/'' and <b>/''' should be left alone, as HTML 5 redefines them more narrowly and they will continue to be used).
That said, I would not want any of this to hold up simply enabling the use of the basic semantic markup features of HTML that are missing, such as <dfn>, <samp> and <kbd>. Is there some kind of timetable for fixing this? These tags being missing is a major thorn in my side, and given that this bug was opened years before I even commented here, I would seem not to be alone.
ayg wrote:
Ask Brion Vibber why he thinks we shouldn't add any more tags. Discussion of removal of <u>/<tt>/<strike>/<s> is not really relevant to this bug, open a new bug.
giecrilj wrote:
(In reply to comment #28)
Ask Brion Vibber why he thinks we shouldn't add any more tags.
Brion, do you still think that the elements Q, DFN and ABBR are not worth adding to wikitext?
Chris
ayg wrote:
You might want to e-mail him directly, or better yet, try to catch him in irc://irc.freenode.org/mediawiki.
abxabx wrote:
I have just noticed that lack of <DFN> in fact limits how wiktionaries are displayed on results of search engines. Indexing bots, due to lack of markup for definitions (unlike wikipedias we have short definitions in wiktionaries) sampling of definitions is quite random: from examples, from ethymology, from notes. I'm affraid we lost many of the visits of possible contributors because of such broken finding of abstracts. What's the progress in implementing missing markup?
ayg wrote:
I implemented <abbr> support in r54241 for the specific reason given in the commit message (i.e., to support r54242). Addition of all other elements is, as I said in comment 23 back in September, pending a statement by Brion, at least on my part.
mediawiki wrote:
Before whitelisting any tag that start with the letter 'a' (<abbr>, <address> and <acronyn>), please look at bug 22905. Those three tags will be confused with <a> by doMagicLinks() in Parser.php.
smccandlish wrote:
2004-10-10. This was opening more than 5 1/2 years ago! What's the problem?
ABX 2009-06-22 18:23:25 UTC:
I have just noticed that lack of <DFN> in fact limits how wiktionaries are
displayed on results of search engines.
That's a serious issue. I need <dfn>, too, for rich glossary code I'm working on.
Aryeh Gregor 2009-08-02 22:17:12 UTC:
I implemented <abbr> support in r54241 for the specific reason given in the
commit message (i.e., to support r54242)...
Almost 8 months later,
Solitarius 2010-03-23 15:55:40 UTC:
Before whitelisting any tag that start with the letter 'a' (<abbr>, <address>
and <acronyn>), please look at bug 22905. Those three tags will be confused
with <a> by doMagicLinks() in Parser.php.
Then fix doMagicLinks()? If the sky didn't fall in 7+ months, what's the issue, and how important is it? How hard to resolve?
How about some real information other than "Brion said no almost 5 years ago" (and has probably forgotten and doesn't care). Even if he did, maybe he doesn't now. If consensus can change, surely his mind can too. I guess I'll drop him a line and ask him to take another look at this, because the problems are mounting, and no real reason has been given here why not to add support for the rest of the basic HTML tags (most if not all of these date to HTML 2.0 at least; it's not like they're weird, untested stuff).
DFN *obviously* needs to be implemented ASAP. ABBR and ACRONYM would be incredibly useful in every WikiMedia project I can think of. I'd add KBD and SAMP next, so we can all stop blatantly abusing CODE and TT. Q and ADDRESS last, I'd say.
michael wrote:
See also Bug 23932 - Enable, whitelist, and incorporate semantic HTML5 elements: article, aside, dialog, figure, footer, header, hgroup, mark, nav, section, time
ayg wrote:
<abbr> is implemented. <acronym> is obsolete and will not be whitelisted; use <abbr> instead. <kbd> and <samp> are basically useless. <q> is dodgy because it's theoretically supposed to insert quotation marks automatically, but that doesn't work on any IE <= 8 (IIRC), and it's non-conformant to add the quotation marks manually, so it can't actually be used. <address> makes no sense on wikis.
That leaves what, <dfn>? That sounds like it might be worth whitelisting. Does anyone have any actual evidence that it's used by search engines? Alternatively, S. McCandlish, what's this rich glossary code and why does it need <dfn>?
theevilipaddress wrote:
I think there are potential uses for these tags within wikis: <kbd> is used for keyboard text and could for example be used for [[Template:Key press]] on the English language Wikipedia. <samp> could be useful for formatting the software text given by other software, for example in manual pages. The usages stated at http://www.w3.org/TR/html5/text-level-semantics.html#the-samp-element and http://www.w3.org/TR/html5/text-level-semantics.html#the-kbd-element are more than valid usages of this in a wiki. The <q> element's usages are more than fulfilled, and since the <blockquote> tag is already allowed anyway, why not also allowing this? It should be the wikis' own decision if they want to use it despite the IE problems.
The <address> seems really rather pointless and since <acronym> is deprecated, they shouldn't be added, but the rest is acceptable IMHO.
ayg wrote:
The question isn't whether you could conceive of a case where <samp> or <kbd> could legitimately be used, but rather whether you can come up with a case where it would be *more* useful than just using <code> or <tt> or whatever. If not, why should we whitelist them? They'll just confuse people, they don't add anything.
If a wiki actually wants to use <q> despite the IE problems, it can request that and it might be considered. I haven't seen such a request. If it were allowed across the board, individuals would use it without realizing the problems, figuring that because it works in their browser it must work for everyone. Moreover, <q> is much harder to type than regular old quotation marks, so is anti-wiki.
(I removed acronym and address from the summary because no one seems to support adding them.)
smccandlish wrote:
I apologize for the length of this, but this has gone on for far, far too long, and is getting increasingly out in left field on several points. This bug needs to get fixed as soon as possible. I'm not trying to be sarky, but it's getting very tiresome having this bug's fixing being held off or outright rejected on a basis of "I don't see why..." So, I'll explain why in detail. It'll be tedious, but this bug's remained mostly unfixed for years, so I guess it's necessary.
I think there are potential uses for these tags within wikis: The kbd tag
is used for keyboard text and could for example be used for [[Template:Key
press]] on the English language Wikipedia. The samp tag could be useful
for formatting the software text given by other software, for example in
manual pages.
Right. And perhaps even more to the point, these serve specific semantic and usability/accessibility purposes, and Wikipedia and other MW wikis are presently sorely abusing the code element and (now deprecated) tt element to make up for the lack of these. They should never have been left out to begin with. None of these should have, really, except arguably the q element because of its widespread implementation issues.
[The dfn element] sounds like it might be worth whitelisting.
Does anyone have any actual evidence that it's used by search engines?
Alternatively, S. McCandlish, what's this rich glossary code and why does
it need dfn?
The lack of the dfn element is a particular thorn in my side on Wikipedia right now, in reality and not in theory, as WP and other MW-based wikis *overwhelmingly abuse* the dl/dt/dd elements all over the place (e.g. every talk page!) for purely visual indentation (:) and/or boldfacing (;). A new Bugzilla ticket (if one doesn't exist for this yet) needs to be opened to fix this - replace all dl/dt/dd output of ":" and ";" wikimarkup with CSS-styled divs. The only way to distinguish a real definition (e.g. in a glossary, in-article or as a stand-alone list article) at the Web semantics level (see below for numerous reasons this is useful and important) from misuse of these elements, is with the dfn element. See draft guideline at [[WP:GLOSSARY]], and its geeky-as-heck subpage for some detail on MW/WP evilbadness when it comes to definition and more general lists. For more background on the dfn element and its stable HTML 5 future, see the http://www.w3.org/TR/html5/text-level-semantics.html#the-dfn-element page.
That said (i.e. my personal reason for stumping for dfn), the dfn element is also very useful all over Wikipedia and similar sites just by the element's very nature. It should be one of the most-used. For example, I think that on Wikipedia in particular, the bold-faced beginning of lead sections in mainspace articles should mostly be done with a template that auto-adds dfn, instead of manual boldfacing, e.g.: "An {{leadterm|electrokardiogram}} is..." I mean, really, that's precisely what this element exists for: To flag the defining instance (in its context) of a term. The average human reader looking (i.e. with working eyes in a typical browser) at a WP article might not experience anything differently, but it has a lot of automated processing potential, and accessibility improvement potential, especially in articles that present several closely-related things in one article (e.g. submodels or "trim levels" of a car, e.g. the GT Cruiser and Street Cruiser variants of the PT Cruiser). With dfn support, users of text-to-speech screen readers would be able to customize their style sheets to do something specific for dfn-flagged "defining instances" to help distinguish them from "just another section" and "just another boldfaced something".
I don't recall making any search-engine-related argument about dfn, though there may well be one. And an argument like "we shouldn't implement it because search engines don't use it" (which I'm not sure was being made here, but it kind of looked like it) is logically invalid anyway, since there may be many other things/people that do/will use the feature under discussion for various reasons and purposes (not to mention that it's tautological and circular - a search engine can't use a wiki feature that isn't implemented, by definition, ero the lack of evidence of the search engine using the feature on wikis cannot be used as an argument against the feature's wiki implementation, obviously - it puts the cart before the horse).
The usages stated at
http://www.w3.org/TR/html5/text-level-semantics.html#the-samp-element and
http://www.w3.org/TR/html5/text-level-semantics.html#the-kbd-element are more
than valid usages of this in a wiki.
Agreed on all these points.
[The address element] makes no sense on wikis.
and
The address element seems really rather pointless and since the acronym
element is deprecated, they shouldn't be added, but the rest is
acceptable IMHO.
and
I removed acronym and address from the summary because no one seems to
support adding them.
Putting the address element back in since I and whoever first proposed adding support for it, probably among others above, obviously do support adding it. The address element isn't deprecated in HTML 5 (http://www.w3.org/TR/html5/sections.html#the-address-element), so DO include it. It serves a well-defined semantic purpose just like every other non-presentational element. It IS actually particularly useful on wikis (not necessarily WikiPEDIA, mind you, but keep in mind that MW software can be used for an endless number of end purposes, including databases of contact information, etc.) when integrating metadata inline in the content with id= and a standardized metadata schema like hCard/vCard. It should be in the MW software, and it should be up to individual installations' system operators whether to turn that element off.
The acronym element IS being deprecated in HTML 5, and SHOULD be dropped from this bug (it's because acronyms are just a form of abbreviation, so the element was redundant with abbr).
The abbr element SHOULD certainly be supported, and I'm glad that it has finally been added. But it was arguably the second least important one to add support for! D'oh.
<kbd> and <samp> are basically useless.
and
The question isn't whether you could conceive of a case where the samp or
kbd tags could legitimately be used, but rather whether you can come up
with a case where it would be *more* useful than just using the code or tt
or whatever. If not, why should we whitelist them? They'll just confuse
people, they don't add anything.
You seem to be ignoring the nature of [X]HTML and Web semantics. *These tags actually mean something* and they all mean something *different*. Nothing could be further from the truth than them being "basically useless" (or they wouldn't have been carefully preserved in HTML 5 and even better explained there than in HTML 4 and earlier). It actually blew me away for a while when I tried to use these as intended, in template documentation, and they didn't work. Disabling them was pointless and a bad idea. You seem to me to be approaching this from a 1995 browserwars-era, HTML 3-ish "if it LOOKS right, it IS right" Web dev paradigm that is long obsolete and which generates genuine problems for many people on both sides of the user/provider Web equation. It's ultimately irrelevant that your particular browser, and even Wikpedia's style sheets, may choose to *style* these tags the same, visually (monospaced, non-proportional font), and they thus *appear* redundant to you. *The are not the same*. User keyboard input (kbd element) is not output (samp element) is not source code (code element) is not a variable (var element) and so forth. Any user of a modern browser is free to override the default style sheets they receive from WP or any other site and from their browser, there is no guarantee that every visual browser does and forever will style these elements all the same by default, there no guarantee that even Wikipedia will always style them the same (esp. given the weirdness that WP does in CSS to many things, including the pre element), there's a near certainty that some screen readers do not treat them as identical by default, and there's an absolute certainty that power users of good screen readers customize their style sheets to not treat them as identical.
There's no evidence I'm aware of that they confuse people. If they did, they would no longer be part of the [X]HTML specs, given that there's been all of the 1990s and 2000s to get rid of them if they were actually problematic. Mostly only HTML-experienced editors do anything with HTML elements manually on Wikipedia anyway, and they are the least likely to be confused. If some dwid does muck something up, someone with more know-how will fix it, just like everything else on Wiki (which is chock full of much, much more confusing things that a couple of HTML elements). HTML elements are mostly used in templates, which again are usually created by savvy editors who know the difference between one element and another. The "confusing" argument is therefore unconvincing.
The elements *do" add something: They add semantic precision, which is a boon for editing clarity (is this code? is it output? what is it? what are we trying to communicate here? if I need to update this template documentation, which parts are code, user input, and example output?), it adds user customizability with style sheets, it adds to accessibility, it adds to MW's Web standards compliance, it adds to open content (i.e. Wikipedia, other WikiMedia projects, and many non-WM projects) portability by separating different kinds of data, it enhances the ability to template and metadata-ize content, and so on, and so on. There's a reason all these elements still exist. I mean, really, HTML *could* be stripped down to only a handful of tags like div and span and p and br, if semantics didn't matter and only presentation was an issue (even tables can be simulated with CSS and divs!).
So, to answer your "the question", it is *automatically and by definition* "more useful" to use the proper elements for the content that is appropriate for them, than to continue to abuse the code and (worse yet) tt elements, just like it is automatically and by definition more useful to use a screwdriver than a hammer when dealing with screws instead of nails, regardless of the fact that a sufficient application of force with a hammer can drive a screw into wood as if it were a nail. Use the right tool for the job, and you have smoother work, a happier worker, and better work output.
Moreover, <q> is much harder to type than regular old quotation
marks, so is anti-wiki.
This may be a moot point, because the q element is problematic for other reasons, but I again don't think you are properly applying Web semantics and the purpose of this markup. The q element is not a replacement for quotation marks. It's semantic markup that tells editors, parsing software (browsers, screen readers, format-to-format translation software, specialized search code, etc., etc.) "this is a quotation". Quotation marks are used for many things that are not quotations, e.g. song titles. The presence of quotation marks is not an indication that something is a quotation, but the q element is.
And just for the record, the "anti-wiki" bit simply isn't applicable at all. As with other stuff under discussion here, no one expects the noob or even average editor to bother with this. We DO expect geeky, code-experienced editors to bother with it, especially in the quotation-related templates.
HOWEVER, I'm in favor of deferring implementation of the q element until such time as the Web at large agrees on how it should really be implemented and all major browsers treat it the same. Normally I will never bend at all to suit broken Microsoft apps, but this issue actually goes beyond that, as some browsers do not auto-insert quotation marks, the auto-generated quotation marks are treated as non-content (like li bullets and numbers), in many browsers (i.e. they don't show up in a copy-paste), and few editors know that many browsers auto-generate them (and all should, at least according to current and near-future versions of the HTML specs). I'm removing the q element from the subject line in concurrence with Aryeh. If someone wants to argue for enabling q, it might be better to do that in a separate bug number and make a well-defended case for it. The last thing I want is for problems with q to hold up implementation of dfn, kbd, samp and address, in that order.
If a wiki actually wants to use <q> despite the IE problems, it can
request that and it might be considered. I haven't seen such a request.
"A wiki" can't make a request; editors on/of a wiki make requests. I concede on the q element for now, for reasons given, but am definitely requesting that the rest be implemented as soon as possible. The situation is actually the complete opposite, really: MW developers have willy-nilly disabled various useful features of [X]HTML for no reason at all in most cases (q arguably being an exception), despite no demand for the developers to do this, and now several years of this Bugzilla thread seeking an end to it, to only minimal avail so far (abbr).
Salient quote:
Ian Hickson, Wednesday, 29 April 2009 6:44 AM:
On Tue, 28 Apr 2009, Jim Garrison wrote:
I am trying to figure out the best way to replace the tt element as I
migrate to HTML5.Are you using tt to mark up computer code, variables, sample computer
output, user input, for emphasis, to give a span of text in an alternate
voice or mood, a span of text to be stylistically offset from the normal
prose without conveying any extra importance, or something else?
This question must be asked every time a tt element is replaced (manually or via AWB or whatever, and they WILL need to ultimately be replaced over the next couple of years), yet *two of the correct and most common answers will not actually be usable until bug 671 is actually fixed*.
PS: If we put angle brackets around the element names in these discussions, many mail systems that try to parse [X]HTML on the fly in e-mail regardless of its MIME type will interpret them as markup and not render them as text. I.e. "since the <blockquote> tag is already..." renders as "since the tag is already...", with everything after "the" indented (sorry, some of you won't be able to read that properly; the example passage has a blockquote tag in it). This makes the messages basically impossible to correctly parse without coming to bugzilla.wikimedia.org to read them. I've refactored the quoted material in this message to compensate (e.g. I used "the blockquote tag", etc.).
smccandlish wrote:
Woops, forgot to update bug title to remove the q element. While I'm at it, I put the remaining elements in order of more to less needed (for WP, anyway). Also clarified title to effectively exclude q; if someone wants to push it now, while the IE issue is still considered an active problem by some, this should be its own bug report, as the contentiousness of q is probably what has most held up fixing the rest of this bug for almost 6 frakkin' years. And I added the "easy" keyword, since this actually does seem to be very simple; a patch has been provided, which with minor modification is still applicable. Sorry for the back-to-back posts.
ayg wrote:
(In reply to comment #41)
The lack of the dfn element is a particular thorn in my side on Wikipedia right
now, in reality and not in theory, as WP and other MW-based wikis
*overwhelmingly abuse* the dl/dt/dd elements all over the place (e.g. every
talk page!) for purely visual indentation (:) and/or boldfacing (;). A new
Bugzilla ticket (if one doesn't exist for this yet) needs to be opened to fix
this - replace all dl/dt/dd output of ":" and ";" wikimarkup with CSS-styled
divs. The only way to distinguish a real definition (e.g. in a glossary,
in-article or as a stand-alone list article) at the Web semantics level (see
below for numerous reasons this is useful and important) from misuse of these
elements, is with the dfn element. See draft guideline at [[WP:GLOSSARY]], and
its geeky-as-heck subpage for some detail on MW/WP evilbadness when it comes to
definition and more general lists. For more background on the dfn element and
its stable HTML 5 future, see the
http://www.w3.org/TR/html5/text-level-semantics.html#the-dfn-element page.That said (i.e. my personal reason for stumping for dfn), the dfn element is
also very useful all over Wikipedia and similar sites just by the element's
very nature. It should be one of the most-used. For example, I think that on
Wikipedia in particular, the bold-faced beginning of lead sections in mainspace
articles should mostly be done with a template that auto-adds dfn, instead of
manual boldfacing, e.g.: "An {{leadterm|electrokardiogram}} is..." I mean,
really, that's precisely what this element exists for: To flag the defining
instance (in its context) of a term. The average human reader looking (i.e.
with working eyes in a typical browser) at a WP article might not experience
anything differently, but it has a lot of automated processing potential, and
accessibility improvement potential, especially in articles that present
several closely-related things in one article (e.g. submodels or "trim levels"
of a car, e.g. the GT Cruiser and Street Cruiser variants of the PT Cruiser).
With dfn support, users of text-to-speech screen readers would be able to
customize their style sheets to do something specific for dfn-flagged "defining
instances" to help distinguish them from "just another section" and "just
another boldfaced something".I don't recall making any search-engine-related argument about dfn, though
there may well be one. And an argument like "we shouldn't implement it because
search engines don't use it" (which I'm not sure was being made here, but it
kind of looked like it) is logically invalid anyway, since there may be many
other things/people that do/will use the feature under discussion for various
reasons and purposes (not to mention that it's tautological and circular - a
search engine can't use a wiki feature that isn't implemented, by definition,
ero the lack of evidence of the search engine using the feature on wikis cannot
be used as an argument against the feature's wiki implementation, obviously -
it puts the cart before the horse).
So what you're saying is that theoretically someone somewhere might be able to derive some benefit from this, but you don't have specific examples of people who *will* benefit from it? Like who have said they want it and will use it for some specific constructive purpose if it's available?
Maybe we should allow it even if it's only useful in theory, but if there were real-world uses (i.e., non-hypothetical) then I'd certainly be fine with adding it. I'd like to be clear one way or the other on that. (In contrast, I think <address> certainly should not be added under any circumstances, and have *very* strong doubts about <kbd> and <samp> given their incredibly limited utility.)
Putting the address element back in since I and whoever first proposed adding
support for it, probably among others above, obviously do support adding it.
The address element isn't deprecated in HTML 5
(http://www.w3.org/TR/html5/sections.html#the-address-element), so DO include
it. It serves a well-defined semantic purpose just like every other
non-presentational element. It IS actually particularly useful on wikis (not
necessarily WikiPEDIA, mind you, but keep in mind that MW software can be used
for an endless number of end purposes, including databases of contact
information, etc.) when integrating metadata inline in the content with id= and
a standardized metadata schema like hCard/vCard. It should be in the MW
software, and it should be up to individual installations' system operators
whether to turn that element off.
The address element gives contact information for the author of the page. Since wiki pages are, by their nature, not owned by anyone or authored by any single person, it makes no sense for wikis. If someone really wants to abuse MediaWiki as a CMS, and really wants to use <address>, they can hack it into Sanitizer.php. Or ask for it to be optionally (non-default) whitelisted. It makes no sense in normal wiki pages, though, and should not be allowed by default.
You seem to be ignoring the nature of [X]HTML and Web semantics. *These tags
actually mean something* and they all mean something *different*. Nothing could
be further from the truth than them being "basically useless" (or they wouldn't
have been carefully preserved in HTML 5 and even better explained there than in
HTML 4 and earlier). It actually blew me away for a while when I tried to use
these as intended, in template documentation, and they didn't work. Disabling
them was pointless and a bad idea. You seem to me to be approaching this from
a 1995 browserwars-era, HTML 3-ish "if it LOOKS right, it IS right" Web dev
paradigm that is long obsolete and which generates genuine problems for many
people on both sides of the user/provider Web equation. It's ultimately
irrelevant that your particular browser, and even Wikpedia's style sheets, may
choose to *style* these tags the same, visually (monospaced, non-proportional
font), and they thus *appear* redundant to you. *The are not the same*. User
keyboard input (kbd element) is not output (samp element) is not source code
(code element) is not a variable (var element) and so forth. Any user of a
modern browser is free to override the default style sheets they receive from
WP or any other site and from their browser, there is no guarantee that every
visual browser does and forever will style these elements all the same by
default, there no guarantee that even Wikipedia will always style them the same
(esp. given the weirdness that WP does in CSS to many things, including the pre
element), there's a near certainty that some screen readers do not treat them
as identical by default, and there's an absolute certainty that power users of
good screen readers customize their style sheets to not treat them as
identical.
I'd like to see specific examples of screen readers or power-users' stylesheets that do not treat these the same as <code> or <tt>, since you say these are near certain or absolutely certain.
Semantic distinctions need only be drawn where they're concretely useful. Otherwise you could spend all day marking things up for some hypothetical consumer who will probably never exist. If you have no specific examples of nontrivial numbers of users who you can *demonstrate* (not conjecture) will actually make use of the distinction, it is not worth the added language complexity.
I will particularly point out that wikitext is not HTML, and wikitext is not intended to produce all valid HTML. Just because someone at some point in the distant past in HTML's development thought that these distinctions might be worth making in HTML, doesn't mean that wikitext has to agree.
There's no evidence I'm aware of that they confuse people. If they did, they
would no longer be part of the [X]HTML specs, given that there's been all of
the 1990s and 2000s to get rid of them if they were actually problematic.
Mostly only HTML-experienced editors do anything with HTML elements manually on
Wikipedia anyway, and they are the least likely to be confused. If some dwid
does muck something up, someone with more know-how will fix it, just like
everything else on Wiki (which is chock full of much, much more confusing
things that a couple of HTML elements). HTML elements are mostly used in
templates, which again are usually created by savvy editors who know the
difference between one element and another. The "confusing" argument is
therefore unconvincing.
One of the goals of wikitext is to minimize the amount of actual HTML in markup, to make the code less intimidating to non-techy types. By their nature, <dfn> <samp> and <kbd> will be used in actual article text if they're used at all -- they can't be hidden away in the depths of templates in most cases. More and different HTML tags in wikitext works against the original purpose of wikitext. The people who *add* it might know what it means, but the people who have to *read* it (other editors) might not.
HTML is, again, different from wikitext. Wikitext does not aim to allow users to produce all possible HTML. HTML5 has elected to keep these elements because they're fairly harmless and it doesn't want to force people to change their preexisting valid pages needlessly. I actually filed a bug against HTML5 suggesting <samp> and <kbd> be removed:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=9919
It will probably be resolved within a few months, one way or the other.
So, to answer your "the question", it is *automatically and by definition*
"more useful" to use the proper elements for the content that is appropriate
for them, than to continue to abuse the code and (worse yet) tt elements, just
like it is automatically and by definition more useful to use a screwdriver
than a hammer when dealing with screws instead of nails, regardless of the fact
that a sufficient application of force with a hammer can drive a screw into
wood as if it were a nail. Use the right tool for the job, and you have
smoother work, a happier worker, and better work output.
In other words, you cannot present any specific practical benefit to allowing these elements, it just makes us More Semantic and this is obviously good? I don't find this argument convincing.
"A wiki" can't make a request; editors on/of a wiki make requests.
I meant that there could be an on-wiki discussion that reached consensus to ask the developers to make a change. I really doubt such a thing would happen for <q>, but if it did, it would be relevant.
IMPORTANT: After this bug is resolved (and finish reading to see why it *must* be), there will ultimately need to a be a new bug report here, to get rid of support for the tt element entirely, which doesn't exist in HTML 5. Here's a good discussion of this issue (more broadly than wiki), and Googling about it turns up more: http://lists.whatwg.org/htdig.cgi/help-whatwg.org/2009-April/000233.html
<tt> exists in HTML5. It's just classified as obsolete presentational markup, and is not valid:
http://www.whatwg.org/specs/web-apps/current-work/multipage/obsolete.html#tt
The same is true for loads of other things that we allow. We cannot remove support for these from MediaWiki without a migration path to convert all existing markup somehow. But this is a totally separate bug.
PS: If we put angle brackets around the element names in these discussions,
many mail systems that try to parse [X]HTML on the fly in e-mail regardless of
its MIME type will interpret them as markup and not render them as text. I.e.
"since the <blockquote> tag is already..." renders as "since the tag is
already...", with everything after "the" indented (sorry, some of you won't be
able to read that properly; the example passage has a blockquote tag in it).
This makes the messages basically impossible to correctly parse without coming
to bugzilla.wikimedia.org to read them. I've refactored the quoted material in
this message to compensate (e.g. I used "the blockquote tag", etc.).
If your mail client touches angle brackets in plaintext e-mails, it is absolutely and completely broken and you need to tell it to stop. E-mail supports MIME types and your client needs to respect them. It's part of the SMTP standards and everything: http://tools.ietf.org/html/rfc1652. Surely you're not saying I should write my Bugzilla posts to work around broken tools that don't obey standards? 0:)
michael wrote:
The address element gives contact information for the author of the page.
Since wiki pages are, by their nature, not owned by anyone or authored by any
single person, it makes no sense for wikis.
Off the top of my head, an address element could be appropriate in the following places:
In the context of a wiki, a link to a user page or user talk page is appropriate content for the address element.
It
makes no sense in normal wiki pages, though, and should not be allowed by
default.
What's a “normal” wiki page? This is general-purpose software. Allowing just what we need for Wikipedia will make all wikis look like Wikipedia. For goodness' sake, we have a million-definition multilingual dictionary with no dfn elements!
Whitelisting HTML tags gives editors the opportunity to demonstrate demand by using them. Insisting that we don't whitelist tags because it is “anti-Wiki” is self-fulfilling. The whole point of Wikitext is to provide easy shortcuts, *without* artificially limiting editors.
Yeah, it would be good to list more use cases. But this “we don't have it so it's not useful” argument is circular and by its nature unconvincing. Wikitext may be the world's most common way to generate HTML, and what we choose to allow in the software strongly affects what people will use, and not vice-versa.
giecrilj wrote:
(In reply to comment #37)
<abbr> instead. <kbd> and <samp> are basically useless. <q> is dodgy because
No, they are useful, albeit a limited domain. If HTML had been designed by biologists, it would have SPECIES and FAMILY instead.
(In reply to comment #40)
If a wiki actually wants to use <q> despite the IE problems, it can request
that and it might be considered. I haven't seen such a request. If it were
Try "HTML q element" [1].
(In reply to comment #41)
That said (i.e. my personal reason for stumping for dfn), the dfn element is
also very useful all over Wikipedia and similar sites just by the element's
very nature. It should be one of the most-used. For example, I think that on
Wikipedia in particular, the bold-faced beginning of lead sections in mainspace
articles should mostly be done with a template that auto-adds dfn, instead of
manual boldfacing, e.g.: "An {{leadterm|electrokardiogram}} is..." I mean,
really, that's precisely what this element exists for: To flag the defining
instance (in its context) of a term.
I would rather say [[{subst:PAGENAME}]] and leave inserting the DFN tags to the engine. (I admit I have changed my position on this subject.)
major browsers treat it the same. Normally I will never bend at all to suit
broken Microsoft apps, but this issue actually goes beyond that, as some
browsers do not auto-insert quotation marks,
HTML5 goes flip-flop about this. The version I have in memory cache says quotation marks should be explicit inside Q.
I am trying to figure out the best way to replace the tt element as I
migrate to HTML5.
The wiki way of embedding code sections is to put white space on the beginning of the line. This leaves the question of in-line code; software manuals need this but it is a very special application. I cannot imagine an example of in-line output.
(In reply to comment #43)
I'd like to see specific examples of screen readers or power-users' stylesheets
that do not treat these the same as <code> or <tt>, since you say these are
near certain or absolutely certain.
My Internet Explorer used to render CODE in 10pt and SAMP in 12pt, probably upon assumption that CODE is for nerds with big lenses ;-)
[1] <URL:http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Archive_103#HTML_q_element>
giecrilj wrote:
(In reply to comment #44)
Off the top of my head, an address element could be appropriate in the
following places:
- In the page footer, for the Wikimedia button.
This case does not need white-listing.
smccandlish wrote:
Another long one, since there are so many things to cover. This time I've broken into into clear sub-topics. I'm also proposing this be split into multiple bugs.
Summary: Aryeh appears to have removed his (and, I note, the *only* extant) objection to implementing dfn.
Maybe we should allow it even if it's only useful in theory, but if there
were real-world uses (i.e., non-hypothetical) then I'd certainly be fine
with adding it
Yay! This argument is over then, since I've already satisfied your conditions by providing examples of such uses. Ergo:
So what you're saying is that theoretically someone somewhere might be able to
derive some benefit from this, but you don't have specific examples of people
who *will* benefit from it? Like who have said they want it and will use it
for some specific constructive purpose if it's available?
No, that's not what I'm saying. To repeat myself and my specific examples that have been ignored: Every vision-impaired user with a screen reader (text-to-speech) browser can benefit from this, since they can customize their browser-internal style sheet to distinguish between defining instances of a term (on WP, this would usually be the bold intro in the lead, and often the lead-in terms in sections, in complex articles) and misc. stuff that is boldfaced or sectioned for any number of other reasons. Once the code works, WP:MOS and WP:LEAD can be updated, and it will propagate rapidly. And, any author of software that works with MW wikicode would be able to use dfn for various purposes. So, I obviously have already given specific use cases for it on WP in particular, both in glossaries and in the more general "defining instance" case. (Referring to these as "theoretical" and demanding a "non-hypothetical" example of *something that cannot be implemented because MW has disabled it* is just cognitively dissonant.) See also Michael's salient comments on this in Comment #44.
Nothing at all remains in the way of whitelisting kbd, not even dependencies (as was the case for a while with the abbr element).
Bug #671 should become a tracking bug, with dfn, q, address, and kbd+samp being 4 separate bugs listed as its blockers. The reasons for and against each tag are largely different (treating kbd and samp as a linked pair).
Moved to new Bug #24529.
I would rather say [[{subst:PAGENAME}]] and leave inserting the DFN tags to
the engine. (I admit I have changed my position on this subject.)
I don't follow you. Where would this link appear (presumably with correct double squiggly bracketing) and why? If you mean to suggest that PAGENAME can be used in the lead, that won't actually work in hundreds of thousands of cases because of disambiguation and because (especially in bio articles, e.g. [[Newt Gingrich]]) the article title and the subject of the article as given in the lead often do not match even when the article title is not disambiguated. But I may be misunderstanding what your suggestion is. My point was that dfn is explicitly intended for defining instances of a term/name like the bold intros to leads in WP articles. I cannot see a way to automate is application there, so it would have to be done manually or via a template (and if done manually, someone would make a template for it immediately anyway, since doing it manually would be tedious).
*On the address element:
"I don't see why" isn't a valid rationale against implementing this, and no defensible rationale has been given for not doing so, while examples of its usefulness have been given, both on and off WP.
[Non-open wiki operators] can ask for it to be optionally (non-default)
whitelisted. It makes no sense in normal wiki pages, though, and should
not be allowed by default.
Michael Zajac nailed this one completely in Comment #44.
If there's any further argument, break it into a separate bug, since objections to this element are unrelated to anything else under discussion here.
Off the top of my head, an address element could be appropriate in the
following places:
- In the page footer, for the Wikimedia button.
This case does not need white-listing.
Surely. But the rest do, especially on non-WMF wikis, although Michael provides some on-WP examples, too.
"I don't see why" isn't a valid rationale against implementing this, and no strong rationale has been given for not doing so, even if Aryeh's activism at HTML5's bugzilla is causing a delay.
We can concede that implementation of the kbd and samp pair is also temporarily problematic because of alleged uncertainty over at HTML5, and should thus arguably be deferred in MW for the short term. Personally, I'm near certain that Aryeh's HTML5 proposal won't succeed, because HTML5 has been moving toward significantly increased, not reduced, semantic tagging, and in 5 weeks it's met nothing but skepticism. Three (i.e., less than two more) months is probably enough time to see which way that wind blows).
I'd like to see specific examples of screen readers or power-users'
stylesheets that do not treat these the same as <code> or <tt>, since you
say these are near certain or absolutely certain.
I'd like to have a new Lexus and to be 15 years younger.
I'm not going to spend hundreds and hundreds of real dollars buying expensive software to satisfy your nitpicking, especially since you seem so adamantly opposed to this (alone among *all* active respondents on this thread, I might add) that I doubt you'd be satisfied anyway.
I'm also not going to spam around asking for blind users to send me copies of their style sheets, for the same reason, and because interpreting them would be necessarily subjective and extremely time-consuming, and because I have more productive things to do, and so do they, and surely so do you.
Let's not waste any more time here. It's basic logic: All non-crap modern browsers, including good screen readers, provide for local stylesheets (they override remote ones and browser defaults, as I'm sure you're aware). Given that, we can be 100% certain that some users actually use this feature, or it wouldn't be there (I think I first saw this feature around 1996 in Netscape, so that's 14 years in which to get rid of it if it were simply creeping featuritis). Given this and the fact that (go ask any accessibility forum) a major thorn in the collective side of visually impaired users is distinguishability of types of content, and you have a 100% certainty that some visually impaired readers use this feature to distinguish between various different types of content that are often poorly distinguished. Q.E.D.
If there's any further argument about these, after the moribund discussion over at HTML5 peters out, kbd+samp should become its own bug, since the objections to the one are identical to the other, but not even related to objections (where any still exist) to dfn, address or q. The new bug should be labeled LATER, but only temporarily.
It isn't on the table right now; it's implementation really is being held up by uncertainty at the specs level and by grossly inconsistent user agent implementation.
It should be its own bug, marked LATER until the situation stabilizes.
HTML5 goes flip-flop about this. The version I have in memory cache says
quotation marks should be explicit inside Q.
Yeah, that's obviously the sanest solution, but until it's stable I have to agree with Aryeh that we shouldn't implement it here because it could just end up having to be undone, or redone, with repercussions for millions of live pages on thousands of wikis.
What is or isn't useful on Wikipedia itself isn't really of any concern here, since this is open software. MediaWiki is much broader than Wikipedia, and than WMF projects. Much of this bug's extreme lack of progress (I mean, really - this is bug number 671!) is directly traceable to treating this as if it were a *Wikipedia* feature request, rather than MediaWiki one. There is no such thing as "abusing" MW as a CMS. It's a platform for collaborative and easy editing of content, and that obviously *is* a CMS, whatever else you want to call it and whatever technologies it uses, to whatever ends. Any legal use that users wish to put it to for editing and presenting content is perfectly valid.
Michael Zajac's Comment #44 addressed some of this, too. But there's also this point:
Since wiki pages are, by their nature, not owned by anyone or authored by
any single person, [address] makes no sense for wikis.
Not every wiki is publicly editable by the entire world. Some have very tightly controlled editorship (e.g. 3 people, with some pages "owned" by specific individuals).
If your mail client touches angle brackets in plaintext e-mails, it is
absolutely and completely broken and you need to tell it to stop.
yet:
... we only kowtow to [obsolete versions of MSIE] until they're not used by
a significant number of people.
These are contradictory stances. In the case I reported, the GMail app built into the Android operating system, for one, "eats" material in angle brackets. It is certainly "used by a significant number of people". If "fix the broken application; we will not adapt" is good enough for the Droid goose, it's good enough for the IE gander. I've developed this argument a bit more at Bug #23932, where it's more relevant.
Surely you're not saying I should write my Bugzilla posts to work around
broken tools that don't obey standards? 0:)
That's precisely what I'm saying, if you insist on forcing all editors on every MW-based wiki to write all their content and markup around old, buggy versions of MSIE. You can't have it both ways. If we stop kissing IE's buggy but, then I recind any plea for being nice to Droid users. But, as I said, this issue is better discussed at Bug #23932, which is all about supporting Microsoft zombieware.
I'm adding the HTML5 tracking Bug #19719 to "blocks", since dfn is a stable part of HTML5, and kbd, samp and address will almost certainly be, Aryeh's proposal over there notwithstanding. The abbr element has already been whitelisted, so I'm removing Bug #8633 as a "blocks" dependency. Since Bug #22905 has been fixed, removing it as a "depends on" dependency.
giecrilj wrote:
(In reply to comment #47)
- On the idea of applying dfn with a template to the boldfaced term at the top
of an article's lead (e.g. "{{leadterm|elektrokardiogram}}"):
- Comment #45 from Christopher Yeleighton <giecrilj@stegny.2a.pl> 2010-07-22
19:33:30 UTC (In reply to comment #41) ---
I would rather say [[{subst:PAGENAME}]] and leave inserting the DFN tags to
the engine. (I admit I have changed my position on this subject.)I don't follow you. Where would this link appear (presumably with correct
double squiggly bracketing) and why? If you mean to suggest that PAGENAME can
be used in the lead, that won't actually work in hundreds of thousands of cases
because of disambiguation and because (especially in bio articles, e.g. [[Newt
Gingrich]]) the article title and the subject of the article as given in the
lead often do not match even when the article title is not disambiguated.
Just use "|".
<URL:http://en.wikipedia.org/w/index.php?title=Newt_Gingrich&oldid=375340399>
ayg wrote:
Sigh. Okay, frankly, I wasn't really so much against whitelisting these, and I was mostly just responding to particular arguments that I thought were unreasonable, and to some degree playing devil's advocate. In my view, the issue basically boils down to this:
Argument in favor of whitelisting: They're not very useful, but they're practically the only HTML elements we don't whitelist, and it won't hurt anything. We already whitelist similarly marginal elements like <var> and <abbr>. So what's the point in not whitelisting these?
Counter-argument: Wikitext is not meant to be HTML. It's meant to be a simplified language that non-technical users can easily learn, without unnecessary features that might be confusing. We don't have to allow everything that HTML does -- wikitext serves a different purpose from HTML, and it's completely reasonable for us to draw the line at a different place. Since wikitext aims to be simplified, any new language construct needs to be concretely justified.
Counter-counter-argument: So it's okay to allow <span style="position:absolute; top: 2.8em; right:0em; height: 1em; width: 12em; background: red; border: 2px solid gray; color: white">, but <kbd> is too confusing? Not to mention the fact that we have template syntax which causes grown men to break down into uncontrollable sobbing, and on several well-documented occasions has caused onlookers to spontaneously gouge their eyes out in sheer horror. But <kbd>, oh no, that's just way too complicated. Especially what with how <var> and <abbr> have been whitelisted for a long time, and they've demonstrably caused the total collapse of the Wikipedia user base as we know it, what with everyone using them all over the place. Come on, get real.
Counter-counter-counter-argument: . . . :(
Phrased thus, I can see no serious argument for not whitelisting these tags. It makes things more consistent if anything. I'll get some other developers' opinions, and if no one objects, I'll go ahead and whitelist <address> and <dfn>. As for <kbd> and <samp>, we may as well wait until that HTML5 bug is resolved, which should be within a few months. This has waited 5.5 years, so a few months more won't kill anyone.
I'll just respond to one or two points here:
(In reply to comment #45)
HTML5 goes flip-flop about this. The version I have in memory cache says
quotation marks should be explicit inside Q.
The current version says the opposite, and I think it has for a long time:
http://www.whatwg.org/specs/web-apps/current-work/multipage/text-level-semantics.html#the-q-element
(In reply to comment #47)
We can concede that implementation of the kbd and samp pair is also temporarily
problematic because of alleged uncertainty over at HTML5, and should thus
arguably be deferred in MW for the short term. Personally, I'm near certain
that Aryeh's HTML5 proposal won't succeed, because HTML5 has been moving toward
significantly increased, not reduced, semantic tagging, and in 5 weeks it's met
nothing but skepticism. Three (i.e., less than two more) months is probably
enough time to see which way that wind blows).
The way the HTML5 bug tracker works is that everyone is free to comment, but the editor (Ian Hickson, a.k.a. Hixie) has the sole right to make a decision. Hixie normally handles bugs in batches every few months, so I expect he'll decide within a few months. When Hixie makes a decision, it can be appealed to the HTMLWG, which can take like a year, but most of his decisions are not appealed (and I certainly would not appeal whatever decision he makes here).
So the people who have commented so far on the HTML5 bug are irrelevant (since it's Hixie's decision alone at this point), but it will be resolved definitively sooner or later. If he WONTFIXes, can add the elements at that point, if the developers agree on that now.
I'm not going to spend hundreds and hundreds of real dollars buying expensive
software to satisfy your nitpicking, especially since you seem so adamantly
opposed to this (alone among *all* active respondents on this thread, I might
add) that I doubt you'd be satisfied anyway.I'm also not going to spam around asking for blind users to send me copies of
their style sheets, for the same reason, and because interpreting them would be
necessarily subjective and extremely time-consuming, and because I have more
productive things to do, and so do they, and surely so do you.
That's fine, but then you should not have claimed that these things are "certain" or "nearly certain", since you have no actual evidence. I don't ask for you to invest any effort at all in testing anything, but I do ask that you accurately represent how well-supported your statements are. Note that JAWS has a trial version, so I managed to test it out in like ten minutes; and there are open-source screen readers too (see bug 15491 comment 4).
What is or isn't useful on Wikipedia itself isn't really of any concern here,
since this is open software. MediaWiki is much broader than Wikipedia, and
than WMF projects. Much of this bug's extreme lack of progress (I mean, really
- this is bug number 671!) is directly traceable to treating this as if it
were a *Wikipedia* feature request, rather than MediaWiki one.
Actually, it's mostly due to Brion WONTFIXing it, as lead developer. Since he's no longer lead developer, it became possible to revisit it in the last year or so. No developer seems to really care about it, but I defended Brion's position somewhat half-heartedly and got drawn into a giant argument about it, which I continued for a while mainly because I tend to argue pointlessly about details rather than ignoring the stuff that's irrelevant to the conclusion, so I attacked weak arguments in favor of fixing the bug rather than ignoring them and focusing on the strong arguments. Anyway, that's over now.
I'm adding the HTML5 tracking Bug #19719 to "blocks", since dfn is a stable
part of HTML5, and kbd, samp and address will almost certainly be, Aryeh's
proposal over there notwithstanding.
My proposal there doesn't affect <address>.
The abbr element has already been
whitelisted, so I'm removing Bug #8633 as a "blocks" dependency. Since Bug
#22905 has been fixed, removing it as a "depends on" dependency.
It doesn't really matter, but you're not supposed to remove blocking/dependencies once they're fixed. They're supposed to stay there.
michael wrote:
Regarding use cases for the address element, I forgot to mention the best one: to mark up a talk-page user sig.
smccandlish wrote:
[reasonable summary of the debate, elided]
Counter-counter-counter-argument: . . . :(
Right. It may not be helpful in places like this to play devil's advocate, since the other side is liable to take it seriously and argue against the position. :-/
Phrased thus, I can see no serious argument for not whitelisting these tags.
It makes things more consistent if anything. I'll get some other developers'
opinions, and if no one objects, I'll go ahead and whitelist <address> and
<dfn>.
Huzzah!
As for <kbd> and <samp>, we may as well wait until that HTML5 bug is
resolved, which should be within a few months. This has waited 5.5 years,
so a few months more won't kill anyone.
Works for me. At this point I'd sacrifice a goat or something to get even half of this resolved.
http://en.wikipedia.org/w/index.php?title=Newt_Gingrich&oldid=375340399
Why would we do that? The text is question isn't a link, so it shouldn't be marked up as one. And doing so wouldn't fix anything, since there isn't anything in that syntax or its [mis]use that tells the parser "this is the defining instance", something that requires human judgment.
It doesn't really matter, but you're not supposed to remove
blocking/dependencies once they're fixed. They're supposed to stay there.
My bad. In my own uses of Bugzilla, we've simply removed dependencies after they become moot. Didn't realize WMF's wanted to keep them. I can see where it could be useful in a project this complex (fixed bugs show up struck-through, but can still be accessed, e.g. because maybe one wasn't fully fixed with regard to a blocked bug and needs re-opening). But one of them here is not, because this bug was listed as blocking another because of the abbr element, but that element has been whitelisted *and removed from this bug*, so this bug is no longer relevant to the other at all.
My proposal there doesn't affect <address>.
Right. Should have written "Aryeh's proposal over there about two of them".
HTMLWG: Okay, I can concede on that, if your HTML5 proposal is rejected AND you appeal the rejection AND the appeal looks like it might go somewhere. I would not want us to postpone implementation of kbd and samp for a year or more otherwise, though. Kind of a WP:SNOWBALL thing, really. But, I've already agreed that if there's genuine uncertainty, they shouldn't be implemented.
Regarding use cases for the address element, I forgot to mention the best one:
to mark up a talk-page user sig.
Definitely proper in HTML5
http://www.w3.org/TR/html-markup/address.html
Some might question this application in HTML 4.01/XHTML 1.0
http://www.w3.org/TR/REC-html40/struct/global.html#edef-ADDRESS
since it vaguely refers to "a major part" of a page, whatever that means. HTML5 just says "section", which can be whatever you want it to be without any "major" or "minor" value-judgment baggage. Do we care?
Ick. I wonder it we could actually expect editors to not put quotation marks in manually, or have MW work around it if they do? Sounds problematic (and again a good reason to fork that one into its own bug number).
ayg wrote:
(In reply to comment #50)
Regarding use cases for the address element, I forgot to mention the best one:
to mark up a talk-page user sig.
This is invalid in HTML5:
"""
The address element represents the contact information for its nearest article or body element ancestor. If that is the body element, then the contact information applies to the document as a whole.
. . .
The address element must not be used to represent arbitrary addresses (e.g. postal addresses), unless those addresses are in fact the relevant contact information. (The p element is the appropriate element for marking up postal addresses in general.)
"""
http://www.whatwg.org/specs/web-apps/current-work/multipage/sections.html#the-address-element
<address> is actually a weird element, its name is very misleading. A correct usage can be found at http://aryeh.name/siteinfo.html, for example. Most usages in the real world are wrong per the specs. It never meant just "an address". So on a wiki, with no <article> elements, it would only be legitimate to use it to give contact info for the author of the whole page. Which is a very marginal use. But in the end, probably almost no one will use it correctly or incorrectly, so meh, whatever.
(In reply to comment #51)
Right. It may not be helpful in places like this to play devil's advocate,
since the other side is liable to take it seriously and argue against the
position. :-/
I wasn't consciously doing it, I just have an unfortunate tendency to argue on principle.
HTMLWG: Okay, I can concede on that, if your HTML5 proposal is rejected AND you
appeal the rejection AND the appeal looks like it might go somewhere. I would
not want us to postpone implementation of kbd and samp for a year or more
otherwise, though. Kind of a WP:SNOWBALL thing, really. But, I've already
agreed that if there's genuine uncertainty, they shouldn't be implemented.
I won't appeal any rejection -- I don't care enough and I doubt the appeal would be accepted. If anyone in charge of HTML5 is going to agree with me, it will be Hixie, not the co-chairs.
Definitely proper in HTML5
http://www.w3.org/TR/html-markup/address.html
Nope, definitely improper, see above. It represents contact info for the <body> or <article> it's in, not arbitrary contact info.
Ick. I wonder it we could actually expect editors to not put quotation marks
in manually, or have MW work around it if they do? Sounds problematic (and
again a good reason to fork that one into its own bug number).
The idea is that they'll be auto-added in CSS.
michael wrote:
(In reply to comment #52)
(In reply to comment #50)
Regarding use cases for the address element, I forgot to mention the best one:
to mark up a talk-page user sig.This is invalid in HTML5:
The default sig contains a link to the home page of the writer. That is relevant contact info.
It's not as specific as one might want, since all contributors' addresses are applied to the entire page body, rather than each to its own comment, but that's a perfectly valid interpretation. From HTML5:
If node is a body element
The contact information consists of all the address elements that have node as an ancestor and do not have another body or article element ancestor that is a descendant of node.
giecrilj wrote:
(In reply to comment #51)
- On other stuff:
- Comment #48 from Christopher Yeleighton 2010-07-25 08:42:53 UTC ---
http://en.wikipedia.org/w/index.php?title=Newt_Gingrich&oldid=375340399
Why would we do that? The text is question isn't a link, so it shouldn't be
marked up as one. And doing so wouldn't fix anything, since there isn't
anything in that syntax or its [mis]use that tells the parser "this is the
defining instance", something that requires human judgment.
The text in question is a relative link that happens to be reflexive. The engine detects that the link is reflexive so it renders the link as STRONG (instead of A). I would say it could convert the first reflexive link in the intro section of the article body to a DFN equally well, therefore allowing DFN is not necessary for this use case. The advantage is that the link uses wiki syntax, not HTML syntax.
abxabx wrote:
(In reply to comment #54)
I would say it could convert the first reflexive link in the
intro section of the article body to a DFN equally well, therefore allowing DFN
is not necessary for this use case. The advantage is that the link uses wiki
syntax, not HTML syntax.
How would that work for wiktionaries, wikispecies, wikisources with OCR of old encyclopedias and all other places where main definition is placed differently than in pedia?
smccandlish wrote:
Aryeh, in Comment #52:
Definitely proper in HTML5
http://www.w3.org/TR/html-markup/address.htmlNope, definitely improper, see above. It represents contact info for the
<body> or <article> it's in, not arbitrary contact info.
You quoted too selectively. The rest of it is: "If an address element applies to a section of a document, then it represents contact information for that section only." I.e., the element (and I agree it's an odd one) represents attribution, not arbitrary contact information, but it's scope is actually undefined (it *defaults* to the whole document, but can be just a "section", and the interpretation of that term is left open). This was actually *less* clear in HTML 4.01, which to my eyes suggested that it was only good for whole-document attribution except maybe there was an exception, but what sort of exception wasn't clear. I think that those working on HTML5 are clarifying for the reality that "a web page" or "a document" in spec terms is, in "Web 2.0" days like these, often a conglomeration of a bunch of different and different kinds of content from all sorts of sources, such that "section"-based attribution is frequently necessary.
Michael in Comment #53:
It's not as specific as one might want, since all contributors' addresses
are applied to the entire page body, rather than each to its own comment,
but that's a perfectly valid interpretation. From HTML5:
I'd say it's more specific than many users of address would normally think of, since it associates content at a very detailed level with specific author contact information. And it's non-problematic, since each contribution can be considered a "section" for HTML5 spec purposes, for which the contribution's author's user talk page, as you note, really is the relevant (i.e., non-arbitrary) contact information. Unless I've missed something here.
Aryeh in #52 again:
But in the end, probably almost no one will use
it correctly or incorrectly, so meh, whatever.
I concede that possibility. :-)
Aryeh in #52 again:
I wonder it we could actually expect editors to not put quotation marks
in manually, or have MW work around it if they do? Sounds problematic (and
again a good reason to fork that one into its own bug number).The idea is that they'll be auto-added in CSS
Right. I'm saying that the real-world implementation has been about half-and-half, and it's been my experience that many people aware of the element don't know it is supposed to (and fails to, in some browsers) auto-generate the quotation marks (and specific types, based on language, nesting, etc.) I'd bet real money that over half of the people who attempt to use the q element put quotation marks just outside or inside it. So, implementing it might be (aside from better discussed in a new bug page) impractical until nothing but very obsolete browsers leave out the quotation marks, and users get used to the idea of not adding them manually.
Christopher in Comment #54:
The text in question is a relative link that happens to be reflexive. The
engine detects that the link is reflexive so it renders the link as STRONG
(instead of A). I would say it could convert the first reflexive link in the
intro section of the article body to a DFN equally well, therefore allowing
DFN is not necessary for this use case. The advantage is that the link uses
wiki syntax, not HTML syntax.
This presumes that all cases of a reflexive link are defining instances, which isn't true at all (I'd bet that in the Template namespace there are several thousand such links that are not defining instances, but simply part of misc. sentences in template documentation, because of widespread use of {{tl}}, {{tlx}}, etc. It would play havoc with transclusions, too. Basically, we cannot guarantee any connection between definingness of an instance and reflexive linking. We can't even 100% guarantee that bold stuff at or near the top of an article is a defining instance of a term (e.g. list articles often start with "This is a '''list of whatever'''", and "list of whatever" isn't a term we're defining. The term is "whatever", and it is defined not at this instance, but at the main article on the topic that the list is a [[WP:SUMMARY]] offshoot of. I can think of other issues, but the several already presented are enough to demonstrate that we cannot simply operator-overload the linking code to turn all reflexive links into dfn's.
ABX in comment #55:
How would that work for wiktionaries, wikispecies, wikisources with OCR of
old encyclopedias and all other places where main definition is placed
differently than in pedia?
It wouldn't. This is something that, like *almost* everything else in Wiki, actually needs human judgment applied to it and is best handled with a template.
michael wrote:
(In reply to comment #56)
Aryeh, in Comment #52:
Definitely proper in HTML5
http://www.w3.org/TR/html-markup/address.htmlNope, definitely improper, see above. It represents contact info for the
<body> or <article> it's in, not arbitrary contact info.You quoted too selectively. The rest of it is: "If an address element applies
to a section of a document, then it represents contact information for that
section only." I.e., the element (and I agree it's an odd one) represents
attribution, not arbitrary contact information, but it's scope is actually
undefined (it *defaults* to the whole document, but can be just a "section",
and the interpretation of that term is left open).
I think the spec must have changed since that document was updated. In the latest version, it can only apply to an article or body element:
The address element represents the contact information for its nearest article or body element ancestor.
And the rules for determining the scope are specific.
See http://www.whatwg.org/specs/web-apps/current-work/#the-address-element
smccandlish wrote:
Aughhh! [coughing up skull] I wish they'd quit mucking around with this stuff and leave it alone. I've lodged a gripe about this change at the public W3C bugzilla on HTML5 (http://www.w3.org/Bugs/Public/show_bug.cgi?id=10255), with a proposed div- and id-based solution, but I doubt anyone will care over there. Unless I'm wrong on that point, I'm now thinking that the address element should be permitted by MediaWiki, but disabled by default (just comment its whitelist line out, maybe?), since as Aryeh's observed a typical wiki won't need it in its newly-narrowed scope, even if some more unusual deployments might, with specific editors or groups or whatever controlling certain pages on those sites.
<a href="http://www.savetubevideo.com">download video</a>
(In reply to comment #58)
<a href="http://www.savetubevideo.com">download video</a>
What's this link?
smccandlish wrote:
- Comment #59 from Liangent <liangent@gmail.com> 2010-07-29 07:40:25 UTC ---
(In reply to comment #58)
<a href="http://www.savetubevideo.com">download video</a>
What's this link?
Nothing. Some add-on I installed was either buggy or outright malware. That link was appearing a the bottom of every text entry field in every web form I went to after I installed it. Didn' notice it and deal with the problem until after that was submitted. Sorry for the "accidentospam".
ayg wrote:
Whitelisted <dfn> in r70164. Let's keep the remaining three in this bug, no need to proliferate bugs unnecessarily. <kbd> and <samp> are waiting on the W3C bug.
As for <address>, I'll add it anyone thinks that's a good idea, since I'm tired of arguing about it. I don't think it should be added behind an off-by-default config option unless someone actually says they'll use it on their wiki -- it's a two-line change to Sanitizer.php anyway, so it would be easy enough to change for someone who really wanted it.
(In reply to comment #58)
Aughhh! [coughing up skull] I wish they'd quit mucking around with this stuff
and leave it alone.
You were quoting from from a totally different spec, which only gives brief and imprecise summaries for authors. It's not meant to be used for anything more than cursory reference. If you look at the first page it begins with "This non-normative reference . . .":
http://www.w3.org/TR/html-markup/
The correct, normative spec is one of these (they're interchangeable for this purpose):
http://dev.w3.org/html5/spec/
http://www.whatwg.org/specs/web-apps/current-work/multipage/
theevilipaddress wrote:
I'd personally not whitelist the <address> tag in core. Whereas it really may have some usage, i.e. on disclaimer pages or the like, I think it has far too few usages to be useful in the default configuration. Instead, I'd suggest coding a (probably simple) extension which allows the address tag in user input. This still allows you to use it if you really want, but prevents likely misuses when in the default installation.
happy.melon.wiki wrote:
I agree that whitelisting <address> would be a net detriment to MediaWiki's semantic footprint, not an increase; as the number of misuses would far outweigh the number of correct uses.
Resolving this FIXED. We've whitelisted all the elements requested here except those that became deprecated.
We're not going to add <address> per the concerns listed above.
If someone wants to whitelist additional attributes, we should open a new bug rather than continuing the tl;dr here :)