Wrap each wiki page section contents in a container
OpenPublic

Description

On a wiki page a "section" is a HTML heading element (H2,
H3, etc) followed by text using any type of formatting,
up to the next heading element. It generally consists of
an "editsection" DIV if they are enabled, followed by a P
element containing an anchor presumably for the TOC etc,
then the content - which may contain subsections.

Now if each section was wrapped from start to finish in
an HTML DIV it would make it much easier to implement
such things as section-folding or giving various kinds of
sections special colours. To do this now involves parsing
the entire "bodyContents" and rebuilding it with such
DIVs inserted - a slow and uncertain process.

Even better would be an outer DIV including the H tag and
an inner DIV beginning after the H tag.

Another enhancement would be to auto-generate and ID for
each section and subsection based on the contents of the
owning H tag, perhaps including its parents in the case
of subsections.


Version: unspecified
Severity: enhancement

bzimport added a project: MediaWiki-Interface.Via ConduitNov 21 2014, 9:15 PM
bzimport added a subscriber: Unknown Object (MLST).
bzimport set Reference to bz6104.
Hippietrail created this task.Via LegacyMay 26 2006, 11:23 PM
brion added a comment.Via ConduitMay 26 2006, 11:28 PM

This isn't practical with current system; sections may split up
across table cells, etc.

Hippietrail added a comment.Via ConduitJun 4 2006, 8:55 PM

I'm reopening just to ask if a simple solution can be implemented in the short
term which does not attempt to surmount the problem Brion mentions above.
Specifically, the English Wiktionary and probably all Wiktionaries shouldn't
cause that problem but could make immediate use of section CSS.

On other wikis such a temporary solution should of course not be enabled.

Hippietrail added a comment.Via ConduitJun 4 2006, 8:57 PM

See also Bug 4741: Semantic HTML for section anchors

bzimport added a comment.Via ConduitFeb 14 2007, 3:46 AM

ayg wrote:

(In reply to comment #2)

I'm reopening just to ask if a simple solution can be implemented in the short
term which does not attempt to surmount the problem Brion mentions above.
Specifically, the English Wiktionary and probably all Wiktionaries shouldn't
cause that problem but could make immediate use of section CSS.

Some kind of check would still be needed to make sure this is possible.
doHeadings() just uses a bunch of regex passes; it's not aware of contextual
stuff like whether there are table cells or whatnot nearby. Then again, maybe
Tidy and/or Sanitizer would be clever enough to fix any resulting screwups
acceptably. I suppose that would depend upon the exact styles and so on that
people would try to give it. If we had a *proper* parser, of course, we could
presumably use a <tbody> instead of a div inside tables and properly nest it so
as to maintain validity, but that's not happening soon.

Hippietrail added a comment.Via ConduitFeb 25 2007, 2:32 AM

I have a proof of concept of this running at http://wiktionarydev.leuksman.com
(Hit random - it's not on the main page)

So far it's modified core code though it only touches one function in
Parser.php. It doesn't seem to be compatible the normal TOC so I've disabled it.

bzimport added a comment.Via ConduitJul 5 2007, 8:48 PM

ayg wrote:

Copying a relevant post I was going to make to wikitech-l but decided not to because it was off-topic:

Done naively, this breaks XHTML validity if the header is wrapped in any tag at all, and it seems very difficult to fix it for perfectly reasonable, nontrivial, legal cases like

This is the Declaration of Independence.
<div class="cited-document">

Section 1

We don't like the British.

Section 2

Therefore we're not going to be your colony anymore.
</div>
This is a very compelling and historical document.

Observe that there, the trailing text is not intended to be part of Section 2 even though MediaWiki would consider it as such at present. Possibly you could construct some algorithm that would figure this out, but it's not particularly easy to do, especially if the tag structure is not as reasonable as this (use your imagination!). Are we going to start rewriting the document structure when an algorithm doesn't think it makes any sense, even if it's valid XHTML? Further issues arise with tables, where <div> wrappers are illegal and you have to hope that you can fit a <tbody> around what you want.

I think section wrappers *could* be extremely useful, but when you get down to it, their utility is limited even in the abstract by the fact that not all text is required to be part of any section, except in the technical sense. It would take some fairly drastic overhauling of how we look at and deal with sections for section wrappers to be practicable.

One approach to solving cases like that would be to simply parse each section independently of all the others, and run Tidy and so on on each section separately. That would make scenarios like the above impossible. This would be totally unacceptable for Wikipedia, but if what you say is true, it might be reasonable for the main namespace of Wiktionary. There might be a way of marking a section as not needing a section div for some reason, too (cf. bug 6575).

bzimport added a comment.Via ConduitJul 5 2007, 10:20 PM

michael wrote:

That's a good example of the kind of problem that can make this a sticky issue to resolve.

But it also shows why it ought to be resolved and why it blocks bug 10467 (Use semantic XHTML). With the current wikitext parser, the example code infers an incorrect semantic interpretation for the document. The HTML specification says "A heading element briefly describes the topic of the section it introduces", so "Section 2" is a heading introducing both the second part of the constitution and the article copy after it. The author clearly did not intend this.

In (X)HTML 4, every heading implies a section which ends at the next heading of the same or higher level. This bug proposes making that exact hierarchy explicit. If we accept this, then I think there is a relatively simple solution.

The multi-section div element entered in wikitext explicitly creates a new section within the surrounding text (i.e. one level lower than the previous section heading). Any section headings within that section imply enclosed sections, so they should be bumped down a further level in the hierarchy, and the last one closed before the closing /div tag. Following sections should resume the normal flow until the end of the document.

So the sample wikitext above implies the following structure, which ought to be rendered in the page's XHTML. I've assumed the original had preceding and following sections, to show what could happen (they are unaffected).

== Preceding section ==
This is the Declaration of Independence

  === Editor-entered div/section === <div class=cited-document">

    ==== Section 1 ====
    We don't like the British.
    </div><!-- Section 1 ends -->

    ==== Section 2 ====
    Therefore we're not going to be your colony anymore.
    </div><!-- Section 2 ends: implied closure made explicit by the renderer -->

  </div><!-- editor-entered div/section closure  -->

This is a very compelling and historical document.
</div><!-- Preceding section ends -->

== Following section ==
American Revolutionary War follows.
</div><!-- Following section ends -->

Unfortunately, it is impossible to duplicate this structure explicitly in wikitext only, since there is no way to end a section before the next equal or higher section (as happens at the end of Section 2 here).

Questions:

  • Do the automatically-generated sections get a heading or not? If so, how is the text generated.
  • Can this be logically extended to cover nested divs? Or should the div hierarchy remain flat, with following div tags automatically close previous ones.
  • What happens if div tags are not balanced? Can authors enter only a closing </div> tag to end a subordinate section?
bzimport added a comment.Via ConduitJul 5 2007, 10:22 PM

michael wrote:

Another option: such a div element could be considered mis-nested, and ignored by the wikitext renderer.

bzimport added a comment.Via ConduitJul 5 2007, 11:02 PM

ayg wrote:

(In reply to comment #7)

In (X)HTML 4, every heading implies a section which ends at the next heading of
the same or higher level.

Not really. My above example is reason enough to discard that. Even if you add a heading for the whole Declaration, HTML provides no way to indicate that the ended <div> terminates the section. It only says that user agents should be able to construct a table of contents automatically, which they can, and in fact MediaWiki does exactly that. To use another counterexample, the final heading tag in the source of http://www.w3.org/ is the one entitled "Systems", yet it precedes the completely unrelated footer, which has no heading tag.

(Incidentally, the last draft of XHTML 2.0 that I looked at had some kind of tag to explicitly delimit sections, <section> or something.)

The multi-section div element entered in wikitext explicitly creates a new
section within the surrounding text (i.e. one level lower than the previous
section heading).

Sure, but what about this template-generated table?

Bad table syntax (expected rows <tr>...</tr>) near: <tr colspan="2"><th> == Widget sales for 2006 == <a href="...">edit this template</a> </th></tr> <tr><th>Month</th><th>Number</th> ...

This has the same form as the div example, but its semantics are different. That is, the heading is cordoned off from the section by a parent element, but it does *not* logically cover only its following siblings (in this case only the <a> element), it covers the entire table, which includes cousin nodes and even parents. How do you plan to automatically differentiate these cases? You'd need explicit, user-entered section delimiters for this to work reliably.

Hippietrail added a comment.Via ConduitAug 6 2007, 11:23 AM

I've got a basic version of this working in JavaScript here: http://en.wiktionary.org/wiki/User:Hippietrail/addstructure.js

It is designed for and tested only on the English Wiktionary so far but is not installed there for all users.

It may however be of interest to anybody following this feature request.

bzimport added a comment.Via ConduitNov 15 2008, 9:18 PM

michael wrote:

See also Bug 16190: Relate section anchors to section headings in HTML, describing an alternative which may be simpler to implement and provides some benefits. Bug 4741: Use id's for section anchors instead of <a name=...> is similar to this one.

bzimport added a comment.Via ConduitJan 5 2011, 6:28 PM

michael wrote:

If HTML were to be supported, then a better solution would be to use a <section> element.

See also bug 23932 - “Enable, whitelist, and incorporate semantic HTML5 elements: article, aside, figcaption, figure, footer, header, hgroup, mark, nav, section, time.”

matmarex added a comment.Via ConduitSep 3 2014, 12:02 PM
  • Bug 61615 has been marked as a duplicate of this bug. ***
matmarex added a comment.Via ConduitSep 3 2014, 12:03 PM
  • Bug 70198 has been marked as a duplicate of this bug. ***
TheDJ added a comment.Via ConduitSep 3 2014, 12:32 PM

Since we do some stuff in this area with mobile and parsoid/VE these days. I wonder, what if we do this only for H2's, is there any way we can measure how many pages we would break ?

Parsoid has done metrics on similar problems right ? Perhaps trough that route we could explore it ?

I do know that this:
<div class="cited-document">

Section 1

We don't like the British.

Section 2

Therefore we're not going to be your colony anymore.
</div>

is often used on user pages, so those would likely all break..

Isarra added a comment.Via ConduitSep 3 2014, 3:17 PM

Have it activate for h1s and h2s unless they're embedded in something (another div that doesn't span the entire page, a table, etc), perhaps?

So each h1 and following content would get its own div, which would include the divs for h2s and their following content.

If you have a in-wikitext <div> around two h2s and content, this could either just ignore those, or put the first h2 div around both and just ignore the second h2... or perhaps put both h2+content divs inside the parent div.

Whatever the solution, this would be very useful or even needed on several projects. wikiHow comes to mind, considering how all the content on a howto is broken up into sections in just such a way.

Hippietrail added a comment.Via ConduitSep 4 2014, 2:27 AM

Eight years ahead of my time, apparently (-;
Glad to see others finally noticing some need for this!

He7d3r awarded a token.Via WebDec 1 2014, 8:19 PM
Ricordisamoa added a subscriber: Ricordisamoa.Via WebApr 13 2015, 12:14 PM

Add Comment