Page MenuHomePhabricator

Pick one html5 source (whatwg or w3c) and update all code documentation links to point to that source
Open, LowPublic


We have a lot of code comments that references html5 and dom specs. But, depending on a bunch of factors (who added that comment, maybe what browser was open or what search result google returned or maybe phase of moon), we have refs to both W3C and WHATWG specs. We should pick what our canonical spec source is and proceed to update all spec urls to point to that canonical spec (and while at it, also check for dead links and update them).

This is strictly not a parsoid-php issue, but given that we have ported more than half the code at this point, it is simpler to just do this cleanup post-port on the PHP side.

Event Timeline

ssastry created this task.Apr 25 2019, 4:59 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptApr 25 2019, 4:59 PM
ssastry triaged this task as Low priority.Apr 25 2019, 4:59 PM
ssastry moved this task from Backlog to Post-Port Work on the Parsoid-PHP board.
cscott added a subscriber: cscott.Apr 25 2019, 5:14 PM

This seems pretty low priority; W3C is following the WHATWG spec so there's no real difference. WHATWG tends to lead, and then their decisions are ratified by W3C sometime later.

The only substantive disagreement I'm aware of is wrt the ruby tags for Japanese, <rb> and <rtc> in particular. Both domino and Remex follow the WHATWG draft, which includes rules for rtc and rb in the "body" mode (compare W3C with WHATWG; you're looking for the area around the text A start tag whose tag name is one of: "rp", "rt"; then compare to domino and Remex implementations). IIRC jawiki does use the <rtc> and <rb> tags in wikitext, and so that's presumably a reason to follow the WHATWG spec.

This seems pretty low priority; W3C is following the WHATWG spec so there's no real difference. WHATWG tends to lead, and then their decisions are ratified by W3C sometime later.

Hence I marked it low priority :) .. But nevertheless, it is confusing to flit between one and the other. If we are going to follow the leading edge, we should just use whatwg .. or if we want a "stable ratified" spec, we should go with w3c.

...except for the Japanese ruby issue, where the W3C has philosophical differences which apparently conflict with the desires of jawiki. (I used to actually understand the differences, but I paged all that out long ago.) According to MDN, only firefox implements <rtc>, but all browsers except IE implement <rb>. Of course wikitext doesn't *have* to exactly follow anyone, but it's slightly less confusing to just say "wikitext follows the WHATWG HTML5 spec" than to say "wikitext follows the W3C HTML5 spec, except in the case of <ruby> elements".

...Although apparently there's a "HTML5.1" and "HTML5.2" now, according to the W3C? And <rtc> is parsed in the expected way in the W3C HTML5.2 spec...

So maybe we say we follow W3C HTML5.2? I'd need to do some research on what changes we made between HTML5 and 5.1/5.2 before I'd feel confident in that course, though...

Izno added a subscriber: Izno.Apr 25 2019, 5:46 PM

There are other differences in semantics between the two specifications e.g. <cite> where at least en.WP has in fact influenced the W3C toward a more permissive use for the element.

I'm not sure that's relevant in the context of parsing since I can't think of parsing differences or even content model differences (the latter of which we don't check at this time), perhaps aside from the above discussion?

I'd review the non-normative and not-up-to-date W3C wiki page.

Izno added a comment.Apr 25 2019, 5:49 PM

As for which to prefer, the nicety of the W3C is that it is versioned. The living spec you would need maybe to fork the WHATWG version on github just to have a baseline and then update periodically.

On the other hand, stating "we follow WHATWG HTML5" implies an intention to follow WHATWG as HTML5 grows/changes. We haven't stated that as an explicit goal in the past, but our community seems to assume it, insofar historically as soon as new elements were added to HTML (<section>, etc) there were folks who wanted to start using them in wikitext. Not every HTML feature intersects with wikitext, but tag and entity names are the most direct points of contact and historically they've tended to follow HTML's evolution closely.

Izno added a comment.Apr 25 2019, 7:51 PM

Which does have its downsides, e.g. the resurrected b and s elements (which following either WHATWG or W3C would not have saved us).

Tgr added a subscriber: Tgr.Jun 24 2019, 6:40 PM

Seems like the WHATWG is winning this one.

Izno added a comment.Jun 24 2019, 8:22 PM

Seems like the WHATWG is winning this one.

The announcement doesn't say anything about how the differences between the two existing specifications will be remedied.