Page MenuHomePhabricator

Value-less extension attributes not preserved
Open, MediumPublic

Description

<syntaxhighlight strict lang="php">$foo</syntaxhighlight>

round trips to

<syntaxhighlight strict="" lang="php">$foo</syntaxhighlight>

Either Parsoid needs to fix this, or we need to tell extension authors that attributes must have values (as per XML).

Event Timeline

Esanders raised the priority of this task from to Needs Triage.
Esanders updated the task description. (Show Details)
Esanders added a project: Parsoid.
Esanders subscribed.
ssastry triaged this task as Medium priority.Jun 9 2015, 3:23 PM
ssastry set Security to None.

Implementing this would be a little problematic for VE too, as we use a XML builder to generate the wikitext for updating the rendering. We'd probably have to switch to sending over the Parsoid HTML and having the API round-trip it to update the rendering.

A few questions come to mind:

  • Does the dirtying affect the semantics of the tag extension? Or is it just a noisy dirty diff?
  • If the latter, and if only a handful of extensions are affected, we could make a case for tag extensions to be xml-compliant and then use a bot to fix up.
  • If semantics are affected and/or a lot of extensions use these kind of properties, it will take more work.

I suspect not many extensions use this, and even fewer would have a flag attribute that could also be a value attribute. That said, <syntaxhighlight> claimed in documentation that using 'line' without a value was equivalent to the default value, line="FACNY_LIES", although this was all lies and the php only cared about the presence of the attribute (see https://www.mediawiki.org/w/index.php?title=Extension%3ASyntaxHighlight_GeSHi&type=revision&diff=1673241&oldid=1659681 ).

Without doing a full audit of extensions I couldn't say for sure if there are any other instances.

I think a dirty diff of attr to attr="" will annoy users though, so if we do go down the XML route, we should have a default value for booleans, e.g. ="1"

The problem with requiring a value is that ="" looks messy in the source and="1" may confuse users into thinking that ="0" will disable the feature, while many extensions are coded to just check for attribute existence.

I think HTML5 says that an empty attribute is actually itself, not the empty string? (Let me check...)

https://stackoverflow.com/questions/21796217/are-attributes-without-value-allowed-in-html4

Alas, "empty attribute syntax" is one of the things which changed between XML, HTML4, SGML, and HTML5. And wikitext extension tags are not any of those, really, so of course we don't have to match them exactly. But it would be nice to move to the HTML5 semantics, just to reduce surprise.

Parsoid should probably just serialize empty attribute values using the empty attribute syntax, since that's almost always what's wanted. That would take care of most of the dirty diff issue. Of course, we could/should tweak the PHP side API to use the HTML5 semantics (that is, treat empty attribute syntax and the empty string value as identical, as far as extension code can tell).

Please provide me with an update? Are you working on it?

@cscott Are you still working on this bug and VisualEditor changes of <references responsive> to <references responsive =""> ?

Is someone still working on this? I imagine, that this should be a very small hack, or am I wrong?

@Esanders : What do you think, is it a Parsoid bug or Parsoid serializing bug, that edits by VisualEditor change automatically the source code

I am not very sure, but it looks like a problem with this commit, doesn't it? https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/560850

https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Cite/+/560850/3/src/Cite.php#496 Line 496 is now:
$responsive !== null ? $responsive !== '0' : $wgCiteResponsiveReferences,

Should we replace it with:
$responsive !== null ? ($responsive === '' ? '1' : $responsive !== '0') : $wgCiteResponsiveReferences,

I'm not sure if this suggestion would be a good idea. The fundamental problem is that VE currently cannot tell the difference between responsive="" and responsive. From Parsoid's and VisualEditor's perspective both look the same. The suggestion might make one situation slightly better (responsive doesn't become responsive="" any more but responsive="1") but at the same time make other situations worse (responsive="" would also become responsive="1", causing more dirty diffs).

The WMDE-TechWish team did their best to avoid dirty diffs as much as possible in the Cite code we are responsible for. The remaining problem is probably a responsibility of the Content-Transform-Team.

It's not just VE, I don't believe the DOM API (in PHP, JavaScript, etc) provides any way to distinguish these cases. We could chose to serialize empty attributes without the ="" but that also causes dirty diffs, just in the other direction.

We could also add a data-parsoid attribute to track this, to make a dirty diff less likely on serialization. But when VE generates a new node, it doesn't really have a way to indicate whether it wants 'empty attribute syntax' or not.

Change #1227372 had a related patch set uploaded (by C. Scott Ananian; author: C. Scott Ananian):

[mediawiki/services/parsoid@master] [Proof of concept] Track empty attributes in data-parsoid

https://gerrit.wikimedia.org/r/1227372