Page MenuHomePhabricator

Smooth out quirks in the <poem> tag to ease consistent rendering across parsers
Open, LowPublic

Description

Split out from T54061#4110198.

Following an IRC discussion in #mediawiki-parsoid on or about 2018-04-05:

  • We'll get rid of the special handling of :, but preserve the existing appearance by using CSS. So end users won't notice a change, unless they have custom CSS rules with selectors like div.poem > span or .mw-poem-indented, which will not work anymore. This will address T123216.
  • The compact attribute is going to be killed. (We didn't actually discuss this, but everyone appears to agree that it is useless.) This will address T16910.
  • We'll try to solve the T124144 <hr> issue as well. Existing patches for that task will be insufficient to deal with the problem when <pre> is used.
  • Once those things are done, @ssastry suggested to follow this plan:
    • get feedback from Wikisource
    • do visual diffs if required as part of that
    • merge the patch https://gerrit.wikimedia.org/r/#/c/328887/
    • evaluate if there is anything we want to deprecate via linter (e.g. intra-line whitespace)
    • reevaluate whether we want to merge this into core as <lines> as the original proposal stood

Event Timeline

TTO claimed this task.May 14 2018, 12:01 PM
TTO triaged this task as Low priority.
TTO created this task.

I will work on this in two weeks from now. If I don't, please ping and/or unassign me.

@TTO: Pinging you as per your request. :)

TTO added a comment.Jun 11 2018, 8:25 PM

Heh, thanks Andre! I have simply been too busy to work on this. I've set a reminder for Saturday.

Change 328887 had a related patch set uploaded (by TTO; owner: TTO):
[mediawiki/extensions/Poem@master] Switch <poem> to output HTML <pre> instead of <div>

https://gerrit.wikimedia.org/r/328887

TTO added a comment.EditedJun 16 2018, 11:25 AM

@cscott @ssastry Here is what I have come up with:

My latest reimplementation of <poem> is indent-pre with the following differences:

  • <poem> is displayed as normal text with white-space: pre-wrap, not in monospace with white-space: pre and a box around it. And because it's a tag, it can have inline styles, classes, etc.
  • Multiple consecutive intra-line space characters are folded into a single space. (We generally agree that we want to get rid of this difference in the long term.)
  • Block-level #*:; formatting and horizontal rules ---- are permitted.

It turns out to be quite difficult to achieve this behaviour with the PHP parser, because it loves to add <p> tags in all kinds of stupid places, even when you're trying your best to make everything an indent-pre. I'm not at all familiar with the Parsoid code base, but I hope it is much cleaner to implement this there. If the Parsoid implementation is nice, then I think it is OK to tolerate the slightly hacky approach in the legacy parser (can I call it "legacy" yet?).

My hope is that this is a refreshing, simple approach that resonates with you (even if the proposed implementation leaves a little bit to be desired). Let me know your thoughts.

Vvjjkkii renamed this task from Smooth out quirks in the <poem> tag to ease consistent rendering across parsers to c0caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii removed TTO as the assignee of this task.
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from c0caaaaaaa to Smooth out quirks in the <poem> tag to ease consistent rendering across parsers.Jul 2 2018, 1:33 PM
CommunityTechBot assigned this task to TTO.
CommunityTechBot lowered the priority of this task from High to Low.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.

My latest reimplementation of <poem> is indent-pre with the following differences:

  • <poem> is displayed as normal text with white-space: pre-wrap, not in monospace with white-space: pre and a box around it. And because it's a tag, it can have inline styles, classes, etc.
  • Multiple consecutive intra-line space characters are folded into a single space. (We generally agree that we want to get rid of this difference in the long term.)
  • Block-level #*:; formatting and horizontal rules ---- are permitted.

This seems pretty reasonable from a semantics point-of-view, especially if we can remove the middle "consecutive intra-line space character" behavior (or move it into CSS, I think that's basically white-space: pre-line?). Then it's basically just a CSS style change and "allow markup which requires a start of line context".

Hopefully we can make good test cases and confirm that a Parsoid implementation of these semantics is "natural".