Page MenuHomePhabricator

Support correction of spelling, grammar and style errors in Content Translation
Open, Needs TriagePublic

Assigned To
None
Authored By
Pginer-WMF
Feb 3 2018, 11:59 AM
Referenced Files
F12985162: Checkers-style.png
Feb 3 2018, 5:28 PM
F12985156: Checkers-overview.png
Feb 3 2018, 5:28 PM
F12985159: Checkers-spelling.png
Feb 3 2018, 5:28 PM
F12979244: Checkers-spelling.png
Feb 3 2018, 11:59 AM
F12979242: Checkers-overview.png
Feb 3 2018, 11:59 AM
F12979246: Checkers-style.png
Feb 3 2018, 11:59 AM
Tokens
"Love" token, awarded by kostajh."Love" token, awarded by Amire80.

Description

When translating articles with Content Translation, editors can benefit from catching spelling, grammar and style errors early. This helps to produce content of a higher quality and more consistent.

Language Tool provides an opensource platform that can be integrated to support these checks.

The solution supports two operation modes:

  • Correct an error. Errors are highlighted, and selecting them provides the alternatives for the user to replace them.
  • Review all errors. Editors can navigate through all the errors in sequence to review the whole document.

Other considerations:

  • A reporting mechanism helps to flag false positives, to avoid the same issue to be flagged again for the user.
  • Communities should be able to customise their style rules. That allows to indicate which is the preferred form among several valid ones. In this way, translators will know which synonym best to use to be consistent with the rest of the content.

An example is used to illustrate the idea below:

Overview

Checkers-overview.png (720×1 px, 323 KB)

  • A "Review" card shows the total of errors of the different kinds.
  • Errors are highlighted with a dotted underline. Red color is used for spelling mistakes, and blue is used for the rest.
  • Previous and next icons allow to move through the different errors.

Spellchecking

Checkers-spelling.png (720×1 px, 318 KB)

  • Selecting a word with a spelling mistake shows the "spelling" card where options for correcting the word are provided.
  • Users can click on the corrected word to replace it.
  • Reporting a word as correct avoids it to be flagged again later.

Style errors

Checkers-style.png (720×1 px, 328 KB)

  • Clicking on a style error, shows the "style card". It may include a description explaining why a certain form is preferred (with a link to external material if needed), and the preferred form to use.
  • Users can click on the corrected word to replace it.
  • Reporting a word as correct avoids it to be flagged again later.

Event Timeline

In a conversation about this, a good point was made about the fact that many browsers already provide native spellchecking capabilities. This has some implications:

  • We need to consider which is the added value with respect to current (and potentially future) browser support. In this case, it seems that browser support is limited to spellchecking while LanguageTool provides grammar and style support.
  • In the case that custom spellchecking support is provided, we need to consider how it may interfere with the native one. It seems that browsers allow to disable the spellchecking support in HTML (with spellcheck="false"). This allows to provide users with the option to use the custom support or disable it (using the browser default instead).

Browsers, indeed, provide spellchecking capabilities. Safari on desktop even provides autocompletion, similarly to mobile phone keyboards.

However, grammar and style checking goes beyond spellchecking. Spellchecking works on the level of the word; grammar and style checking goes beyond words—to phrases, correct use of prepositions, etc.

Ideally, this should be provided uniformly on all websites by browsers or operating systems, but this is not the case at the moment, and it's hard to see whether this will be the case any time soon. So I generally support something like this in Wikimedia sites, which feature a lot of writing, and where the quality of writing style is important for objective reasons.

Google Docs provides something like this, but there are several problems with this:

  1. It's proprietary, and unique to a particular website. It uses Google's own capabilities with statistics and looking up unlikely word combinations. So Wikimedia definitely won't be worse if it does this.
  2. Google's approach doesn't really involve people. Involving real experienced human editors in improving how the style checking works is an excellent opportunity for engaging caring people, and integrating something like LanguageTool is realistic and open to many languages.
  3. It (probably) relies only on statistics. A famous example is correcting "king" to "monarch" in English style checking (available in Microsoft Office since the mid-1990s). Google Docs doesn't provide this, and its unlikely that such subtle style awareness would be deduced from statistics alone.