Page MenuHomePhabricator

Whitespace should be fixed automatically
Closed, DeclinedPublic

Description

Author: ui2t5v002

Description:
When a user saves a page, the whitespace of the page should be reformatted to fit a template, so that the markup is kept readable and consistent from article to article.

We've had a few discussions on-wiki and there was really only disagreement over the use of a newline after the heading tags. I don't care either way, and was only adding the space because it was added by the "new section" tab:

[[Wikipedia_talk:Manual_of_Style/Archive_39#Improving_the_source_text]]

[[Wikipedia_talk:Manual_of_Style/Archive_43#standard_and_consistent_internal_formatting]]

[[Wikipedia_talk:Manual_of_Style#On_the_policy_regarding_the_use_of_whitespace]]

I doubt anyone disagrees that this is important in other computer programming languages (see [[Programming_style#Spacing]] and [[Indent_style]], for instance), so it should be clear that it's important for a language like wiki markup, too.


Version: unspecified
Severity: enhancement

Details

Reference
bz11498

Event Timeline

bzimport raised the priority of this task from to Lowest.Nov 21 2014, 9:56 PM
bzimport set Reference to bz11498.
bzimport added a subscriber: Unknown Object (MLST).

I consider a Style Fixer a good idea. It has several advantages:

  • A Style Fixer does not change the wikicode parser. Therefore it is completely compatible between wikis, previous and future versions of the software and the article database, and plugins and extensions.
  • It does not force any user to use a particular style while writing.
  • A consistent usage of white spaces greatly contributes to readability of article texts
  • This solution will be less controversial than strict rules and guidelines which force users to write in a certain style and provoke others to manually correct them.

Some things to consider when coding this:

  • The important style and whitespace issues are probably:
    • Spaces around the heading title in headings
    • Empty lines before and after block elements like headings, non-inline images, and tables
    • Multiple empty lines to one single empty line
    • Multiple spaces to one single space
  • Templates after headings often display too many line breaks if an empty line is added after the heading, so better not change these
  • Tabs in plain text (not tables, lists, or templates) should be converted to blanks
  • Blanks and tabs at the end of lines (or in empty lines) should be removed
  • The Style Fixer should only be run before saving the article, not for a preview

IMHO this is far too likely to just break things where there are corner cases in the markup, so my inclination is to keep this as a WONTFIX.

lfw wrote:

I'm all for any particular article having consistent internal formatting, but I really don't care what style that is, as long as it is consistent. So, if the software was to fix up an article's internal consistency, then it would first have to determine what the dominant style was already in place for that article, and just fix-up the inconsistent parts. That sounds like it would be hard to explain to a programmer in such a way that it could be automated. For example, if I can't find a dominant existing style in an article, I just leave it alone until one develops.

ui2t5v002 wrote:

(In reply to comment #3)

I'm all for any particular article having consistent internal formatting, but I
really don't care what style that is, as long as it is consistent. So, if the
software was to fix up an article's internal consistency, then it would first
have to determine what the dominant style was already in place for that
article, and just fix-up the inconsistent parts.

No no. This would be a universal style. It should be consistent from article to article. No need for artificial intelligence or anything.

(In reply to comment #2)

IMHO this is far too likely to just break things where there are corner cases
in the markup, so my inclination is to keep this as a WONTFIX.

Examples?

ui2t5v002 wrote:

(In reply to comment #1)

I consider a Style Fixer a good idea. It has several advantages

Agreed on all points. This is a response to people trying to make "strict rules and guidelines" about manual correction. Things like this should never be done by hand, especially not on an "easy to edit" wiki (or has that concept been abandoned?)

The ones I was concerned about in past discussions:

  • Spaces around the heading title in headings
  • A single newline between all block-level elements, like headings, tables, images, lists, paragraphs of text, and so on
  • Spaces after list identifiers, like I am doing right now
  • Spaces around the heading title in headings

Yep.

  • Empty lines before and after block elements like headings, non-inline

images, and tables

This is the only one people seemed to disagree on. I think more whitespace is generally better, but others said it helps tie the headings to the sections they go with. I don't really care either way. The "new section" button leaves the extra line, which is what I was following. It's also consistent with "all block level elements separated by a newline".

  • Multiple empty lines to one single empty line

When I first started editing, multiple empty lines rendered as a single line. Now they don't, for some reason. If there was a good reason behind that change; if big spaces in the articles are needed for some reason, maybe multiple lines should be kept.

  • Multiple spaces to one single space

Hmm. Lots of discussion about this one.

http://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style_archive_(spaces_after_a_full_stop/period)

They are rendered identically, so we should be concerned about readability in the source.

  • Templates after headings often display too many line breaks if an empty line

is added after the heading, so better not change these

There should be no special cases like this. If templates are different with or without an empty space, that's a separate bug that should be fixed first. Is this another <p><br></p> bug?

  • Tabs in plain text (not tables, lists, or templates) should be converted to

blanks

  • Blanks and tabs at the end of lines (or in empty lines) should be removed

It's important to leave trailing spaces in template documentation, for one. ("Copy and paste this code and then add stuff after the spaces")

  • The Style Fixer should only be run before saving the article, not for a

preview

What's your reasoning on this? I hadn't thought of it.

It would be best to start small, implementing one or two uncontroversial things. (Like the spaces in the headers, for instance; there are no special cases for this, and no disagreement about whether it should be done this way.) After that is implemented and works, add more functionality piece by piece to make sure it works in all cases and that it is desired by the editors.

  • Multiple spaces to one single space

I was thinking about more than two blanks as well as uncommon (Unicode) whitespaces and tabs.

  • The Style Fixer should only be run before saving the article, not for a preview

What's your reasoning on this? I hadn't thought of it.

You do not want to disrupt any scaffolding constructs while somebody is still editing.