Page MenuHomePhabricator

Revise Tone: Exclude certain sections from Tone Recommendations
Open, Needs TriagePublic

Description

User Story

As a newcomer completing Revise Tone suggestions, I want the tool to avoid highlighting tone issues in reference sections, external links, captions, and other non-prose areas so I can focus on improving the main text of the article.

Description

Suggestions should only appear in main article prose. The current extraction pipeline sometimes surfaces suggestions inside tables, reference lists, external links, and other structural elements that are not intended for tone evaluation. These suggestions are generally misleading and can confuse newcomers.

The Growth team can provide examples of current errors across multiple wikis.

We should use practical heuristics that target the most common markup patterns without requiring complete coverage of all variations.

Related previous work: T304150: Allow communities to configure which sections are excluded from link suggestion generation. Can we consider reusing the "List of excluded sections from the Add link task" from Special:CommunityConfiguration/GrowthSuggestedEdits?

Acceptance Criteria:

Exclude content found in commonly structured non-prose sections, including:

  • Reference lists generated by <references> or similar templates
  • External links sections and lists formatted with standard headings

If possible, we should also exclude suggestions within:

  • Tables
  • Infoboxes and data tables
  • Image captions

Filtering does not need to be exhaustive, but should remove the majority of false positives in these areas.

Event Timeline

By using the HTML parser's plaintext functionality and specifying elements to exclude, we should be able to filter out reference lists, external links, tables, infoboxes, data tables, and image captions. When I parsed the direct quote examples from the spreadsheet, the results show only text that appears in the main article prose.

It would be great if the Growth team could provide some examples (the wiki code and article title/revision id are enough), so I can further test the solution’s effectiveness across multiple wikis.

It would be great if the Growth team could provide some examples (the wiki code and article title/revision id are enough), so I can further test the solution’s effectiveness across multiple wikis.

I'll start working on this for enwiki, and check with Ambassadors for other pilot languages.