Description
Tone Recommendations should not surface issues within direct quotes. Revising the tone of quoted material can change meaning, introduce inaccuracies, or conflict with Wikipedia policies that require quotes to reflect source material precisely. The current extraction pipeline may flag problematic language inside quotes, which results in inappropriate or unhelpful suggestions for editors.
This task requests an update to the model and preprocessing steps so that quoted content is excluded from Tone Suggestion candidates.
Requirements
Exclude text that appears within common quote structures on Wikipedia.
At minimum, handle widely used formats such as:
- <blockquote> and nested variants
- Indented block quotes produced by wiki markup
- Quotation marks that consistently signal direct quotations in target languages
A perfect solution is not required. The goal is to remove the most common false positives without overengineering.
Ensure that filtering occurs early enough in the pipeline so the model does not evaluate quoted text as potential tone issues.
Acceptance Criteria
• Tone Suggestions are not generated for text inside identified quote structures.
• The exclusion process works across a representative set of languages that use Tone Suggestions.
• Automated tests or evaluation samples confirm a meaningful reduction in false positives within quotes.
Notes
Examples of suggestions with quotations from: Revise Tone: Articles to feed the model
The Growth team can provide more example articles/suggestions if needed.
The solution should focus on practical heuristics rather than complete quote detection across all languages.