Page MenuHomePhabricator

Null edits remove closing </div> tags from page footer leaving HTML unclosed <div>s in page namespace
Closed, ResolvedPublic

Description

Just noticed while nulediting by a bot:
https://pl.wikisource.org/w/index.php?title=Strona%3AAleksander_Berka-S%C5%82ownik_kaszubski_por%C3%B3wnawczy.djvu%2F34&type=revision&diff=1378587&oldid=1117047

I doubt that removing closing HTML-tag and leaving unclosed <div>s in page namespace (this one is opened in the page header) is intended behaviour.

Or, maybe, we should rely that MediaWiki will close all open tags itself and we should not care?

Re-adding the removed </div> manually does not work unless some other content is also added. The NOEDITSECTION and <references/> from standard footer were intentionally removed as unneeded in this particular page.

Event Timeline

The cheat way would to use one of the closing templates. At enWS we normally template such formatting so we would not notice it particularly ... the {{nnnn block/s}} and {{nnnn block/e}} templates where the "/e" version is usually a </div> template, or an open/close {{nnnn block|}} where the }} is all that sits in the footer.

So create {{div end}} with a hard return and </div> should sort out the code.

So create {{div end}} with a hard return and </div> should sort out the code.

This is a bit different case (the formatting being removed is intended to be used in Page namespace only, unlike your example), but the solution may be indeed similar (creation of wrapping templates).
I am just running pywikibot touch over Page ns in plwikisource, so I expect to locate all uses of </div> found to be controversial.

However the main problem I see here is that the potentially breaking changes and a solution are not loudly announced over wikisources. Is Tech News wrong way for it?

Another problem I noticed if the presence of (illy formatted) <div class="pagetext"> in the header of this page:
https://pl.wikisource.org/wiki/Strona:PL_Miko%C5%82aja_Kopernika_Toru%C5%84czyka_O_obrotach_cia%C5%82_niebieskich_ksi%C4%85g_sze%C5%9B%C4%87.djvu/014
even after cleaning (which I expected to remove this). I doubt anybody did interfere into header/footer content in this case: anything there was likely automatically created. @Tpt : should I open another bug for this case?

For ease of testing, here's a currently affected page on English Wikisource: https://en.wikisource.org/w/index.php?title=Page:Our_Neighbor-Mexico.djvu/12&oldid=5679841

Opening it for editing and clicking "Show changes" shows a </div> in the footer being removed:

image.png (522×1 px, 143 KB)

When viewing raw wikitext: https://en.wikisource.org/w/index.php?title=Page:Our_Neighbor-Mexico.djvu/12&oldid=5679841&action=raw, there is a <noinclude></div></noinclude> at the end, but this tag is not shown when opening the edit form. It matches the opening tag <noinclude><div {{ts|width:500px;|ma|aj|lh13}}></noinclude>.

(I found this page with: https://en.wikisource.org/w/index.php?search=insource%3A%2F%5C<noinclude%5C>%5C<%5C%2Fdiv%5C>%5C<%5C%2Fnoinclude%5C>%2F+-insource%3Apagetext&profile=all)

Presumably, a closing </div> should only be removed when we also removed an opening <div class="pagetext"> (which is required for conversion from the old format, and invisible in diffs).

Change 439652 had a related patch set uploaded (by Bartosz Dziewoński; owner: Bartosz Dziewoński):
[mediawiki/extensions/ProofreadPage@master] PageContentHandler: Do not remove unmatched </div> when unserializing wikitext content

https://gerrit.wikimedia.org/r/439652

Change 439652 merged by jenkins-bot:
[mediawiki/extensions/ProofreadPage@master] PageContentHandler: Do not remove unmatched </div> when unserializing wikitext content

https://gerrit.wikimedia.org/r/439652

matmarex claimed this task.
matmarex removed a project: Patch-For-Review.

For future reference, the "cleanup" function will now work as follows:

  • If the page has a header (begins with <noinclude>):
    • If the header contains <div class="pagetext">, remove it
    • Otherwise, if the header contains <div>, remove it
  • If we removed either of those syntaxes:
    • If the page has a footer (ends with </noinclude>):
      • If the footer ends with </div>, remove it
    • Otherwise:
      • If the page text ends with </div>, remove it

Before my changes, the last step was done regardless of whether the page had a header or whether it was cleaned up.

I think this is still a bit awkward (note that the opening tag is removed even if there is no matching closing tag), but I assume there are historical reasons why it's done this way, and it should not interfere with normal content (there isn't really ever a reason to use a <div> with no attributes).