Page MenuHomePhabricator

Smooth out quirks in the <poem> tag to ease consistent rendering across parsers
Open, LowPublic

Description

Split out from T54061#4110198.

Following an IRC discussion in #mediawiki-parsoid on or about 2018-04-05:

  • We'll get rid of the special handling of :, but preserve the existing appearance by using CSS. So end users won't notice a change, unless they have custom CSS rules with selectors like div.poem > span or .mw-poem-indented, which will not work anymore. This will address T123216.
  • The compact attribute is going to be killed. (We didn't actually discuss this, but everyone appears to agree that it is useless.) This will address T16910.
  • We'll try to solve the T124144 <hr> issue as well. Existing patches for that task will be insufficient to deal with the problem when <pre> is used.
  • Once those things are done, @ssastry suggested to follow this plan:
    • get feedback from Wikisource
    • do visual diffs if required as part of that
    • merge the patch https://gerrit.wikimedia.org/r/#/c/328887/
    • evaluate if there is anything we want to deprecate via linter (e.g. intra-line whitespace)
    • reevaluate whether we want to merge this into core as <lines> as the original proposal stood

Event Timeline

TTO triaged this task as Low priority.
TTO created this task.

I will work on this in two weeks from now. If I don't, please ping and/or unassign me.

@TTO: Pinging you as per your request. :)

Heh, thanks Andre! I have simply been too busy to work on this. I've set a reminder for Saturday.

Change 328887 had a related patch set uploaded (by TTO; owner: TTO):
[mediawiki/extensions/Poem@master] Switch <poem> to output HTML <pre> instead of <div>

https://gerrit.wikimedia.org/r/328887

@cscott @ssastry Here is what I have come up with:

My latest reimplementation of <poem> is indent-pre with the following differences:

  • <poem> is displayed as normal text with white-space: pre-wrap, not in monospace with white-space: pre and a box around it. And because it's a tag, it can have inline styles, classes, etc.
  • Multiple consecutive intra-line space characters are folded into a single space. (We generally agree that we want to get rid of this difference in the long term.)
  • Block-level #*:; formatting and horizontal rules ---- are permitted.

It turns out to be quite difficult to achieve this behaviour with the PHP parser, because it loves to add <p> tags in all kinds of stupid places, even when you're trying your best to make everything an indent-pre. I'm not at all familiar with the Parsoid code base, but I hope it is much cleaner to implement this there. If the Parsoid implementation is nice, then I think it is OK to tolerate the slightly hacky approach in the legacy parser (can I call it "legacy" yet?).

My hope is that this is a refreshing, simple approach that resonates with you (even if the proposed implementation leaves a little bit to be desired). Let me know your thoughts.

Vvjjkkii renamed this task from Smooth out quirks in the <poem> tag to ease consistent rendering across parsers to c0caaaaaaa.Jul 1 2018, 1:10 AM
Vvjjkkii removed TTO as the assignee of this task.
Vvjjkkii raised the priority of this task from Low to High.
Vvjjkkii updated the task description. (Show Details)
Vvjjkkii removed subscribers: gerritbot, Aklapper.
CommunityTechBot renamed this task from c0caaaaaaa to Smooth out quirks in the <poem> tag to ease consistent rendering across parsers.Jul 2 2018, 1:33 PM
CommunityTechBot assigned this task to TTO.
CommunityTechBot lowered the priority of this task from High to Low.
CommunityTechBot updated the task description. (Show Details)
CommunityTechBot added subscribers: gerritbot, Aklapper.

My latest reimplementation of <poem> is indent-pre with the following differences:

  • <poem> is displayed as normal text with white-space: pre-wrap, not in monospace with white-space: pre and a box around it. And because it's a tag, it can have inline styles, classes, etc.
  • Multiple consecutive intra-line space characters are folded into a single space. (We generally agree that we want to get rid of this difference in the long term.)
  • Block-level #*:; formatting and horizontal rules ---- are permitted.

This seems pretty reasonable from a semantics point-of-view, especially if we can remove the middle "consecutive intra-line space character" behavior (or move it into CSS, I think that's basically white-space: pre-line?). Then it's basically just a CSS style change and "allow markup which requires a start of line context".

Hopefully we can make good test cases and confirm that a Parsoid implementation of these semantics is "natural".

@thiemowmde do you have any thoughts on this, particularly regarding https://gerrit.wikimedia.org/r/328887/ ?

As discussed in T254051#6216376 it is unlikely that we will have a CSS file for MediaWiki-extensions-Poem so trying to outfit the <div> with the necessary style may be difficult. On the other hand, a <pre> tag on its own is not going to solve the problem either (by default, <pre> tags use monospace font which is not ideal for a poem, and may be particularly undesirable for non-Latin languages (think Chinese, Persian, Hindi).

I think the plan outlined in this task's description sounds good. I can't say much about the patch https://gerrit.wikimedia.org/r/328887. Killing features is always a super delicate thing to do. We must expect that everything is used as a feature somewhere, no matter if it was intentionally designed to be a feature or not. And no matter how sane a removal is, there is always somebody who will complain (for valid reasons, don't get me wrong).

I think it's possible to switch to a CSS file. I can see 2 possibilities:

  • Either make it so that the indentations of stacked <dd><dd> (created by the : wiki syntax) add up. The existing patch looks like it does this.
  • Or mark every indention level with a class name that names the nesting level, e.g. mw-poem-indented-1, mw-poem-indented-2 and so on. The CSS file assigns em values to these classes. Obviously this solution requires us to pick an upper limit, as we neither want to dynamically create this CSS file, nor have hundreds these class names. I would argue this is even a feature. Make it so that the indentations stop at – let's say – level 6 (same as HTML headlines).

Thanks for your thoughtful analysis.

I don't think we can have a CSS file for this extension though (it would mean an additional ResourceLoader entry point, and we try to add those sparingly and only for high-used features). I have a feeling this task will remain open for quite some time.

TTO removed TTO as the assignee of this task.Nov 23 2020, 11:34 AM

This task is assigned to me, but I am so far out of the loop at this point that I don't think I can make any useful contributions to it any longer. (Yes, despite appearances, <poem> is surprisingly subtle and requires the attention of someone with a decent, up-to-date understanding of MediaWiki parsing infrastructure.)

The existing Gerrit patch https://gerrit.wikimedia.org/r/328887/ could definitely be merged with the minor fixes suggested there.

Moreover, there is a plan in the task description, all written up and ready to go for anyone who wants to take it on!