Page MenuHomePhabricator

Use paragraphs, rather than line breaks, where appropriate within <poem>
Open, LowestPublicFeature

Description

I seen this extension is now on Wikisource and I tried this. I noticed that the paragraphs disappear :

For exemple, without the extension, the wikitext

line 1

line 2

displays in html

<p>line 1</p><p>line 2</p>

and with the poem extension, the wikitext

<poem>
line 1

line 2
</poem>

gives

<div class="poem"><p>line 1<br /><br />line 2</p></div>

I think that it could be better of keeping the paragraphs (whose can have CSS transformations) or perhaps have an attribute in <poem> to display the
paragraphs.

Else it's a great extension :) Sébastien


Version: unspecified
Severity: enhancement

Related Objects

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 9:16 PM
bzimport set Reference to bz6419.
bzimport added a subscriber: Unknown Object (MLST).

Could be done; double-newlines are already broken so could skip the <br/>
there.

oppose - in poem extension <br /> are added by its definition. This property is used eg. in "reverse indentation":

<poem style="margin-left:20px;text-indent:-20px">
Line1 Line Line
Line2 Line Line
Line3 Line Line
</poem>

, etc. Changing the functioning of poem would require to look at all texts that use it, in all ws. I suggest to consider the preparation of a new (independent) extensions and a new tag (<ppoem>).
Z.

oppose - in poem extension <br /> are added by its definition. This property is used eg. in "reverse indentation":

<poem style="margin-left:20px;text-indent:-20px">
Line1 Line Line
Line2 Line Line
Line3 Line Line
</poem>

, etc. Changing the functioning of poem would require to look at all texts that use it, in all ws. I suggest to consider the preparation of a new (independent) extensions and a new tag (<ppoem>).
Z.

Can you please exactly describe what would be the problem? It is not apparent from the code you wrote... A link to a real life example would be helpful. Thanks.

I recommend that this 2006 bug be closed. As @Zdzislaw says there are 9 years of work that have been developed based on the current code and without a detailed analysis of the consequences, it would seem best to develop something beside it and utilise that than return to 0000s of works to check any impact. Such an approach would allow migration in an appropriate manner.

Can you please exactly describe what would be the problem? It is not apparent from the code you wrote... A link to a real life example would be helpful. Thanks.

see: User:Zdzislaw/poem

I recommend to close T8419 as Declined too.
Z

Danny_B set Security to None.

This is obvious misunderstanding due to ambiguous task description. I am pretty sure it was not meant to convert every single ...<br /> to <p>...</p> but the following:

<poem>
lorem
ipsum

dolor
sit
amet

consectetur
adipiscing
elit
</poem>

should be correctly turned to

<div class="poem">
  <p>
    lorem<br />
    ipsum
  </p>
  <p>
    dolor<br />
    sit<br />
    amet
  </p>
  <p>
    consectetur<br />
    adipiscing<br />
    elit
  </p>
</div>

instead of the current semantically incorrect

<div class="poem">
  <p>
    lorem<br />
    ipsum<br />
    <br />
    dolor<br />
    sit<br />
    amet<br />
    <br />
    consectetur<br />
    adipiscing<br />
    elit
  </p>
</div>

Regarding your linked example:
Using of inline stylesheets is counter-accessibility evil. Don't do that. If you had defined global stylesheets instead of injecting inline styles, your necessity for such format would be easy fulfilled by code like:

<poem class="negativelyIndented">
lorem
ipsum

dolor
sit
amet

consectetur
adipiscing
elit
</poem>

and appropriate stylesheet definitions, ie. in MediaWiki:Common.css.

This is obvious misunderstanding due to ambiguous task description. I am pretty sure it was not meant to convert every single ...<br /> to <p>...</p> but the following:

definitely not, in many works of poetry blank lines occur, the syntax

<poem>
lorem
ipsum



dolor
sit
amet

consectetur
adipiscing
elit
</poem>
<div class="poem">
  <p>
    lorem<br />
    ipsum<br />
    <br />
    <br />
    <br />
    dolor<br />
    sit<br />
    amet<br />
    <br />
    consectetur<br />
    adipiscing<br />
     elit
   </p>
 </div>

is often used for this purpose on ws

also correctly turned example still causes problems (regardless of whether the style will be given as css or locally)

Change in functionality of poem could causes many problems on ws.

This is obvious misunderstanding due to ambiguous task description. I am pretty sure it was not meant to convert every single ...<br /> to <p>...</p> but the following:

definitely not,

Definitely not what? (Not misunderstanding or not meant?)

in many works of poetry blank lines occur, the syntax

<poem>
lorem
ipsum



dolor
sit
amet

consectetur
adipiscing
elit
</poem>
<div class="poem">
  <p>
    lorem<br />
    ipsum<br />
    <br />
    <br />
    <br />
    dolor<br />
    sit<br />
    amet<br />
    <br />
    consectetur<br />
    adipiscing<br />
     elit
   </p>
 </div>

is often used for this purpose on ws

This is again semantically incorrect.

  • No paragraphs again.
  • Wider gap should be achieved by styling the appropriate margin or even using the appropriately styled separator. Wider gaps are in most majority of the cases graphical representation without any special semantic meaning other than "this is new paragraph".

also correctly turned example still causes problems (regardless of whether the style will be given as css or locally)

Not true. If it was defined in global stylesheets, you could use

.poem.negativelyIndented p {
  margin-left: 20px;
}
.poem.negativelyIndented p::first-child {
  text-indent: -20px;
}

which solves your needs.

Change in functionality of poem could causes many problems on ws.

{{CitationExamples needed}}

I do work on Wikisource and I haven't found any single issue (fortiori unsolvable) which would be caused by using semantically correct paragraphing of poems. But I do see accessibility issues with using flat <br />ing.

  • Wider gap should be achieved by styling the appropriate margin or even using the appropriately styled separator. Wider gaps are in most majority of the cases graphical representation without any special semantic meaning other than "this is new paragraph".

On pl ws we are trying to reproduce in great detail all the space, indents, spacing, ... according to the scans. Each item, sometimes every page has a different dimension of indentation - for us it is very important. For each page we would have to define separate styles.
Besides, for the average volunteer it is easier to insert 3x new line than to define the bottom or top margin.

  • Wider gap should be achieved by styling the appropriate margin or even using the appropriately styled separator. Wider gaps are in most majority of the cases graphical representation without any special semantic meaning other than "this is new paragraph".

On pl ws we are trying to reproduce in great detail all the space, indents, spacing, ... according to the scans. Each item, sometimes every page has a different dimension of indentation - for us it is very important. For each page we would have to define separate styles.
Besides, for the average volunteer it is easier to insert 3x new line than to define the bottom or top margin.

  1. HTML & CSS were never meant to faithfully reproduce the "great detail" thus you can never achieve the same layout as in book typesetting. If you want to achieve it, create PDFs with copyable text layout in any professional DTP software and then upload it as an alternative.
  2. Particular typesetting belongs to particular edition of given book by particular publisher. And they vary between editions (pagination, layout, visual separators, header/footer, etc...), so it is impossible to comply to all of them (and obviously insane to try to have all such versions on Wikisource when the only difference is visual).
  3. Wikisource is supposed to store OCR'ed texts in correct semantic meanings, not unusable batch of characters looking visually as desired in one particular browser with one particular screen resolution or window size. Most of the block formatting is not stable and varies between user agents and other settings. On the other hand, semantically correctly formatted text can be easily transformed to other formats (TeX, PDF, DocBook, as well as popular text processors formats, etc...) while keeping all the necessary sense. That is not only impossible in random bunch of unmarked text, but all the visual formatting done on and for Wikisource then breaks. That is against one of the main principles of Wikisource which is to create a library of reusable texts and makes them nearly unusable thus wastes the resources spent on it (time, effort, humans...) since such pages are worthless (or require enormous effort on transformation) for further usage outside the Wikisource.

PS: Examples still needed...

Danny_B: your proposal destabilizes the system, which are based on hundreds of thousands of pages. Your experience is not rich, so you can not even imagine the effects of changes you propose.

If you do not like the idea of enlargement <poem>, you can create a template with different parameters => and let us return to proofreading new pages, and not to improve the old pages over and over again.

@Danny_B "Could have" and "should have" are lovely statements and in an ideal perfect world they would be correct. The issue is that your statements are not useful for nine year use of poem. Any changes in the action of poem that change the presentation space are problematic irrespective of syntax correctness.

As mentioned elsewhere, all perfect commentary for a forked code set for <poem2> on how it could be done better.

The rest of your commentary is not particularly helpful and this is not the stage for such commentary. It either belongs on wiki or a separate ticket.

@Danny_B: Poetry, unlike other texts, is highly dependent to its graphic representation. I agree, that not everything can be fully represent in HTML (eg. as HTML does not include few advanced formatting elements) but we make our best efforts to get them displayed consistent to author's intention in a wide wariety of browsers/devices.

Many wikisource base on ProofreadPage extension that merges parts of text (pages) into full text presentation form for main namespace. What we need to achieve is merging few <poem> sections (on separate pages) into a page that looks like a single <poem> section inserting only a spece between them. It should be independent on external styling and ProofreadPage can't interfere into individual pages content, eg.:

<poem>
some
text
</poem> <poem>
some
other

text
</poem>

should be presented in exactly the same way as

<poem>
some
text
some
other

text
</poem>

Similarily:

<poem>
some
text
</poem><br /> <poem>
some
other

text
</poem>

should be presented in exactly the same way as

<poem>
some
text

some
other

text
</poem>

The problem is not they should be displayed in the same way as they are now, but that they should be displayed in a consistent way.

There are likely hundreds of thousands such pages across wikisources so modifying them all is not just work for few days or weeks but for years.

Other problems may appear if the <poem> sections are included in formatting templates. And switching from templates to pure HTML is not an option here.

<poem> is a 'bad' extension that we should never have enabled anywhere in retrospect. It's tag name is misleading and it's HTML is unclean. And this immediately causes it to be hard to fix, because so much depends on the broken and badly defined behavior.

If you really want to fix this, I suggest renaming it, fixing it's output in a 'version=2' output version. This version=2 could be the default for the 'new name', and the old name could imply the default to the old implementation. (Not sure if it is even possible to know about the alias name at that level, so maybe they would have to be 2 totally different parser tags.

<poem> is a 'bad' extension that we should never have enabled anywhere in retrospect. It's tag name is misleading and it's HTML is unclean. And this immediately causes it to be hard to fix, because so much depends on the broken and badly defined behavior.

I don't disagree with that assessment but must point out that a large part of the 'bad behavior' was made unavoidably present because the wiki mark-up/parser/Parsoid at the time of it's development knew nothing else but to add, remove or strip empty paragraph tags, double line returns-to-br-tags and similar nuances by conscious design and stayed that way up through even today's framework.

This rigidness of design can be overcome today within most <poem> wrapped content with some clever usage of the :empty psuedo-class and adjacent sibling selectors (among some others) made available by advancements in the .css standard since then - but again - the inconsistent and/or templatized usage of the poem feature over the past ~nine years makes any "universal" .css type of fix near impossible along this avenue as well.

If you really want to fix this, I suggest renaming it, fixing it's output in a 'version=2' output version. This version=2 could be the default for the 'new name', and the old name could imply the default to the old implementation. (Not sure if it is even possible to know about the alias name at that level, so maybe they would have to be 2 totally different parser tags.

Either way, formatting the verses themselves should not be handled nor affected by the poem element "wrapper" in my view.

TTO lowered the priority of this task from Low to Lowest.Apr 5 2018, 8:25 PM
TTO subscribed.

I'm strongly inclined to decline this. The existence of huge volumes of content, developed over many years, that depend the existing behaviour means there has to be need a very strong reason for change, which I am simply not seeing.

Change 653088 had a related patch set uploaded (by Inductiveload; owner: Inductiveload):
[mediawiki/extensions/Poem@master] WIP: Add semantically correct poem tag

https://gerrit.wikimedia.org/r/653088

Here's a POC patch to add a whole new tag: <ppoem>, which wraps each line in a span and each stanza in a div.

This allows the Wiki to apply a style something like this:

.mw-poem .mw-poem-stanza {
  margin: 1em 0 0 4em;
}

.mw-poem .mw-poem-stanza:first-of-type {
  margin-top: 0;
}

.mw-poem .mw-poem-line {
  display: inline-block;
  text-indent: -4em;
}

.mw-poem hr {
   margin-left: -4em;
}

and get a result like this:

<ppoem>
poem1
:poem2
::poem3
poem3

poem1
::Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliqui
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip

poem1
:poem2
::poem3
----
poem3
</ppoem>

2021-01-01_154218_1302x411_screenshot.png (411×1 px, 46 KB)

I don't know whether a whole new tag (<ppoem> is just a placeholder for "proper poem"!) or add an attribute in the old tag in the same vein as compact is better. I also don't know how to do the Parsoid tests, so guidance there will be very welcome.

Aklapper changed the subtype of this task from "Task" to "Feature Request".Feb 4 2022, 11:02 AM
Aklapper removed a subscriber: wikibugs-l-list.