This task is complete when we have defined the work needed to avoid breaking or corrupting details subreference bodies during wt2wt round-tripping and in VE.
- Write test cases which include angle-brackets, single quote, double quote, and newline in details content.
- Identify which transformation steps are affected by escaping problems.
- Write tasks for Tech Wishes or CTT explaining each issue and necessary fixes.
We can prototype new escaping contexts in the Cite code base, but these probably belong in Parsoid or VE as generic methods.
Findings
Good:
- Legacy parser correctly treats all main ref content the same regardless of whether details are present.
- <br /> renders as a visual newline in main or details, in legacy and parsoid.
- Other HTML tags like <b> already render correctly in main and details, legacy and parsoid.
- Line feeds are rendered by both parsers, if they appear in a <pre>, otherwise they become invisible whitespace or ↵ in VE main content preview.
- HTML entities which can be edited normally are turned into the character during editing, eg. é becomes é and \ becomes \.
- Nothing strange happens when opening and saving wikitext.
- Saved details attribute wikitext correctly escapes ".
Broken:
- Visual editing details wikitext containing turns it into an actual newline which breaks the attribute. Since the newlines are useless anyway, we might want to handle this the same way as and refuse to decode.
- Allowed HTML entities are all changed to their decoded characters after visual editing a ref. These are correctly surrounded by mw:Entity spans in Parsoid, but do not round-trip yet.
- Newline added in visual editor is not converted to a <br />
Strange but out of scope:
- Visual editing transforms <b> to '''.
- Saved details attribute wikitext escapes >, despite this being allowed according to https://html.spec.whatwg.org/multipage/syntax.html#syntax-attribute-value . It seems that > in an attribute breaks both the legacy and parsoid parsers.
- Visual editing transforms \ (backslash) to \.
- Legacy parser carriage return entity turns into a broken character � in legacy-rendered details. In all other contexts it remains undecoded as literal .