Page MenuHomePhabricator

Revise Tone: Exclude direct quotes from Tone Recommendations
Open, Needs TriagePublic

Description

Description

Tone Recommendations should not surface issues within direct quotes. Revising the tone of quoted material can change meaning, introduce inaccuracies, or conflict with Wikipedia policies that require quotes to reflect source material precisely. The current extraction pipeline may flag problematic language inside quotes, which results in inappropriate or unhelpful suggestions for editors.
This task requests an update to the model and preprocessing steps so that quoted content is excluded from Tone Suggestion candidates.

Requirements

Exclude text that appears within common quote structures on Wikipedia.
At minimum, handle widely used formats such as:

  • <blockquote> and nested variants
  • Indented block quotes produced by wiki markup
  • Quotation marks that consistently signal direct quotations in target languages

A perfect solution is not required. The goal is to remove the most common false positives without overengineering.

Ensure that filtering occurs early enough in the pipeline so the model does not evaluate quoted text as potential tone issues.

Acceptance Criteria

• Tone Suggestions are not generated for text inside identified quote structures.
• The exclusion process works across a representative set of languages that use Tone Suggestions.
• Automated tests or evaluation samples confirm a meaningful reduction in false positives within quotes.

Notes

Examples of suggestions with quotations from: Revise Tone: Articles to feed the model
The Growth team can provide more example articles/suggestions if needed.
The solution should focus on practical heuristics rather than complete quote detection across all languages.

Event Timeline

KStoller-WMF renamed this task from Revise Tone: Exclude direct quotes from Tone Suggestions to Revise Tone: Exclude direct quotes from Tone Recommendations.Dec 5 2025, 7:17 PM
KStoller-WMF updated the task description. (Show Details)

I parsed all the examples labeled "Tone issue in direct quote" from the spreadsheet Revise Tone: Articles to feed the model using a HTML parser. Overall, the results look very good.

We can exclude text within:

  • <blockquote>
    • used in article: Richard_Himber, Non-simultaneity, Jerry_Vlasak, Francesco_Fontanesi, Blood_Bank_(EP), Leon_Kroll, History_of_the_Industrial_Workers_of_the_World
  • Template:Quote_box
    • used in article: John_Milne_(judge)
  • Template:Cquote
    • used in article: André_Marty
  • Indented direct quotes
    • used in article: South_Morang

There are some cases in the spreadsheet that we cannot handle. Most of these involve additional issues, such as: a failed attempt to use quotation templates (Hector_Daniel), cases where the choice to include a quote is itself a tone issue (WikiStage), or entire sections that are direct translations (Peace_of_Constance).