Page MenuHomePhabricator

[RfC] Decide whether values outputted by new human readable access functionality should be wrapped in spans.
Closed, ResolvedPublic

Description

Wrapping the values in <span> tags has a whole range of advantages.
First of all it makes it clear that these are values readily formatted for human consumption (this might help prevent people from developing wrong assumptions). Also it enables us to tag these pieces of data with meta information, which could be used to identify actual usage in page output (we can decide on what we want to add there later).

We should decide this soon as adding this from the beginning is trivial to do, but adding this later on might break (flawed) assumptions about the new functions.

Details

Related Gerrit Patches:

Event Timeline

hoo created this task.Oct 4 2016, 1:21 PM
Izno added a subscriber: Izno.EditedOct 10 2016, 10:00 PM

Where another HTML element doesn't suffice, then yes, wrapping in spans would be a good idea. However, for which DataTypes would the <a> element not work? Perhaps NumberType, and StringType. (It might be valuable to have spans internal to those types [or even to use MathML in the former case] to "break up" those into their constituent parts.)

thiemowmde assigned this task to hoo.Oct 11 2016, 3:04 PM
thiemowmde moved this task from incoming to needs discussion or investigation on the Wikidata board.

Since this ticket was created, a product manager decision was made: @Lydia_Pintscher listened to all the arguments we (mostly @hoo and myself) collected and said ok.

My main argument is: Having this new parser function always return an HTML element, even if the content of the HTML element is pure text, does make it 100% obvious what this parser function is about. You get HTML, and should not escape it or process it further. If a user does not want this, he should use the existing {{#property:…}} function.

Rather than talking about a <span> element I would phrase the guarantee we give as follows: "Guaranteed to be a single HTML element node, possibly containing text and other HTML nodes." This gives us the freedom to choose whatever element is appropriate per value type. Monolingual strings become <span lang="…">…</span>, but item references become <a href="…">…</a>.

The guarantee is relevant when feeding such strings to HTML parsers and such. For example, jQuery( '1 <span>m</span>' ) fails, but jQuery( '<span>1 <span>m</span></span>' ) is fine.

If we agree on this, I believe we can close this ticket as resolved.

hoo added a comment.EditedOct 12 2016, 10:12 PM

Rather than talking about a <span> element I would phrase the guarantee we give as follows: "Guaranteed to be a single HTML element node, possibly containing text and other HTML nodes." This gives us the freedom to choose whatever element is appropriate per value type. Monolingual strings become <span lang="…">…</span>, but item references become <a href="…">…</a>.

So if we have a trivial value like a year, that would become <span>2015</span>. But if we have two primitive values here, would it become <span><span>2015</span>, <span>2016</span></span>?

Discussed with @thiemowmde. <span><span>2015</span>, <span>2016</span></span> seems correct and consistent.

hoo added a comment.Oct 13 2016, 2:36 PM

I was just about to implement this… what about 2016? Do we want to be consistent here and have one span for the value and one for the whole thing (<span><span>2016</span></span>) or just one span (<span>2016</span>).

hoo added a comment.Oct 13 2016, 3:08 PM

For completeness, here's my comment on the snakFormatter factory patch:

Note: This implements T147307 by wrapping each individual parsed Snak in a <span>. Wrapping comma separated lists in <span>s is the responsibility of whatever class creates the comma separated list.

So that implements wrapping each Snak in a <span>, but not wrapping the entire list in a <span>.

Izno added a comment.Oct 13 2016, 3:51 PM

In the list case, it should ideally be, with appropriate CSS:

<ul><li>2016</li><li>2017</li></ul>

I suppose you could leave the spans in and produce something like this, but this is unnecessary HTML:

<ul><li><span>2016</span></li><li><span>2017</span></li></ul>

I suppose you could always output a list of one in the "one claim" case and then you don't need the spans. (The CSS needs a little help for that, but it's possible.)

This is what I have in mind:

  • <span class="…" data-…="…">2015</span>
  • <span><span class="…" data-…="…">2015</span>, <span class="…" data-…="…">2016</span></span>

A single value is wrapped in a span, and can have additional attributes. Two values must be wrapped in individual spans, otherwise it would not be possible to have value-specific attributes.

The outer span is only necessary to fulfill the guarantee to always return single DOM element.

The outer span could even have a specific class name. Or be a list. But I would not do anything like this in the first iteration. Just spans, please. Make this as trivial as possible. I would also avoid commas, if possible.

thiemowmde closed this task as Resolved.Oct 19 2016, 10:45 AM
thiemowmde moved this task from Backlog to Done on the Wikidata-Sprint-2016-10-12 board.
hoo reopened this task as Open.Oct 20 2016, 7:02 PM

We missed to discuss one thing: If a Snak is formatted as empty string, should we put a <span> around it? Depending on how we plan to use these spans later on, that could come in handy (also for consistency), but I can see that becoming a mess down the line.

As discussed, I fully support the idea of data attributes, microformats and such. But:

  • We must drop the EscapingSnakFormatter in the DataAccessSnakFormatterFactory anyway, and create one that understands these semantics. We will rethink lots of things anyway when we do this.
  • I believe this new formatter must still ignore empty values, and never output empty elements. There is nothing the user can do with it, if it does not have a label. Note that the main purpose of the output this new code produces is to be meaningful to the user. Being machine readable is a bonus.
  • Currently this is required, otherwise we would render weird stuff like "A, , C, ". An option would be to check for '<span></span>' when joining the list with commas, but I do not like the idea of code that tries to detect empty HTML elements.
  • Let's say we give users the option to output an HTML list. Rendering an <li> with an empty <span> can not be done, I believe (possibly as an option, but not by default). So where would the empty span go instead? After the list? I find this weird. Better not output it.

Change 317127 had a related patch set uploaded (by Thiemo Mättig (WMDE)):
Always wrap output of the {{#statements|…}} function in a <span>

https://gerrit.wikimedia.org/r/317127

I agree: empty output should not be wrapped, but stay empty. Logically, this is really NULL, not a string. We only use a string in such cases because of technical restrictions.

hoo added a comment.Oct 24 2016, 6:54 PM

On IRC @daniel and I decided that we actually want to wrap the whole list of values in another span even if it has only one element. That would make <span<span>2016</span></span> for the value 2016.

Izno added a comment.Oct 24 2016, 7:03 PM

On IRC @daniel and I decided that we actually want to wrap the whole list of values in another span even if it has only one element. That would make <span<span>2016</span></span> for the value 2016.

Why wouldn't you always output an HTML list at that point?

Change 317127 merged by jenkins-bot:
Always wrap output of the {{#statements|…}} function in a <span>

https://gerrit.wikimedia.org/r/317127

thiemowmde added a comment.EditedOct 25 2016, 2:27 PM

On IRC @daniel and I decided that we actually want to wrap the whole list of values in another span even if it has only one element.

Referencing an IRC discussion as if it had lead to a decision is of no value for people not involved in the discussion. Not only can I not follow the arguments, I don't even know what arguments you guys shared, which you considered more relevant than others, and if you missed arguments others consider critical.

I disagree.

Quoting myself (from https://gerrit.wikimedia.org/r/317127):

The one and only guarantee we want to give is that the output is always a single DOM element (or nothing).
Therefore we must wrap multiple elements in an other span. Otherwise some outputs would be "<span>2001</span>, <span>2002</span>". This would have multiple issues that can all be avoided by wrapping it in something like <span class="wb-statements">…</span>.
If the output is a single DOM element anyway, there is no need to do this. The contrary: It will cause unnecessary confusion if a single statement is wrapped in something that claims to be a list, but is none. This becomes very obvious the moment we support other list renderings. An <ul> with a single element is bad. Instead the element itself should be returned, not wrapped in anything.

daniel added a comment.EditedOct 25 2016, 2:33 PM

@thiemowmde wrote

If the output is a single DOM element anyway, there is no need to do this.

Consistency. If you were the poor soul writing a CSS rule for styling values in a statement list, you would hate any special casing here.

The contrary: It will cause unnecessary confusion if a single statement is wrapped in something that claims to be a list, but is none.

In what situation would that cause confusion? What problem would that confusion cause?

This becomes very obvious the moment we support other list renderings. An <ul> with a single element is bad. Instead the element itself should be returned, not wrapped in anything.

It's even worse then: If we support <ul> rendering, and we skip it for a single value, the caller would not know whether to expect a block level element, or an inline element.

Your example actually makes me more convinced that we should not have a special case for single values.

Why wouldn't you always output an HTML list at that point?

Because we usually want inline rendering, not a block level element.

(Note: I edited my comment above.)

If you were the poor soul writing a CSS rule for styling values in a statement list, you would hate any special casing here.

I am the "poor soul" writing CSS. I can assure you, there is no problem with wrapping things that are lists in a tag that represents a list, and not doing this with things that are not a list.

the caller would not know whether to expect a block level element, or an inline element.

I don't understand why you say that. In the discussions I was involved in nobody ever said it would be a good idea to give any guarantees about inline vs. block elements. It's guaranteed to be a DOM element. That's all.

I am the "poor soul" writing CSS. I can assure you, there is no problem with wrapping things that are lists in a tag that represents a list, and not doing this with things that are not a list.

Well, I guess that's the crux of the disagreement, then: I think it is a list even if there is only one element, and it should be made obvious that it is a list. In particular, callers should never rely on it not being a list, even if it is commonly a single value.

I don't understand why you say that. In the discussions I was involved in nobody ever said it would be a good idea to give any guarantees about inline vs. block elements. It's guaranteed to be a DOM element. That's all.

We did have a lengthy discussion about div vs span because of this issue. IIRC we decided to actually guarantee an inline element there. It's rather important for the caller to know whether they will get an inline or block element.

thiemowmde added a comment.EditedOct 25 2016, 4:04 PM

Sounds more like the disagreement is what the purpose of the new feature is, who will use it, for what and how. To me your arguments sound like you are outputting XML or JSON. Sure, as a consumer of an XML stream I would not like it if some nodes are sometimes not there.

But this is not relevant here. This output should never be parsed. It's a convenience feature to be used in infobox templates. Anything that "just looks wrong" from a users perspective will be rejected and not used by the communities. And "just looking wrong" is true for commas that separate empty elements, thumbnails separated by commas, and lists that aren't lists.

We did have a lengthy discussion about div vs span because of this issue.

As already said above a private chat is of no value. Who is "we"? What issue? What arguments did you shared? Which arguments did you considered more relevant than others? Which arguments did you missed?

Izno added a comment.EditedOct 25 2016, 4:31 PM

Why wouldn't you always output an HTML list at that point?

Because we usually want inline rendering, not a block level element.

Inline rendering is a trivial CSS fix (and can be varied based on classing on parent elements). My read of HTML5 seems to indicate it would be semantic to place it within a <p> or a <div> element, so I don't see an issue there.

EDIT: ^ My read of the HTML5 spec is wrong regarding lists in paragraphs. Ref StackOverflow

And I agree with thiemowmde at

Sounds more like the disagreement is what the purpose of the new feature is, who will use it

given his statement at

It's a convenience feature to be used in infobox templates.

Since I do not know that I would expect an infobox template to take inline content rather than block content of this sort. En.WP usage is sometimes block (either incorrectly with <br> or correctly with hidden-bullet <ol>) and sometimes inline (usually just a no-HTML list).

And I happen to agree with

This output should never be parsed.

in that it should not be designed to be parsed (what the clients will do, will do).

I turned this discussion left and right in my head last night. I would love to see us doing something like this:

<span data-property="P570" data-property-type="time">
<span data-guid="q42$65EA9C32-B26C-469B-84FE-FC612B71D159" data-value="2001-05-11">11 May 2001</span>
</span>

So yea, this is using two spans, even if there is only one statement for a property. But not because of "consistency" reasons (doesn't matter), or because it makes writing CSS selectors easier (it doesn't), but because it just makes sense. It's a nice mapping from our data model (where property and data value are two distinct things) to HTML wikitext. It avoids any duplication.

Our default rendering will be comma-separated spans. But we provide a parameter for other list renderings. These are all I can think of that are already in use out there and make sense in certain contexts:

Line breaks:

<div data-property="P570" data-property-type="time">
<span data-guid="q42$65EA9C32-B26C-469B-84FE-FC612B71D159" data-value="2001-05-11">11 May 2001</span><br />
</div>

Divs:

<div data-property="P570" data-property-type="time">
<div data-guid="q42$65EA9C32-B26C-469B-84FE-FC612B71D159" data-value="2001-05-11">11 May 2001</div>
</div>

Wikitext list:

<div data-property="P570" data-property-type="time">
* <span data-guid="q42$65EA9C32-B26C-469B-84FE-FC612B71D159" data-value="2001-05-11">11 May 2001</span>
</div>

HTML list:

<ul data-property="P570" data-property-type="time">
<li data-guid="q42$65EA9C32-B26C-469B-84FE-FC612B71D159" data-value="2001-05-11">11 May 2001</li>
</ul>

Numbered lists don't make sense, because there is no order. Tables may make sense, but I can't think of a good default table rendering. I believe this needs a more complex solution, see T146735 and related.

@thiemowmde whatever your reasons, I like the structure you propose :)

thiemowmde closed this task as Resolved.Dec 6 2016, 1:28 PM