Page MenuHomePhabricator

[Suggestion] Detect when a variety of English is being used that does not align with article's tag
Open, In Progress, HighPublic

Description

This task involves the work of introducing a new edit suggestion within the VisualEditor Suggestion Mode that makes people aware when a variety of English is being used within an article that does not align with the tag/template associated with said article.

Deployment plan

In technical terms, this can be deployed by adding it to the existing configuration on enwiki.

  • enwiki already has a configuration for textmatch, so the british-english section of the configuration file below should be added to textMatch.matchItems.
  • A user with permission to edit the configuration must make the change.
  • To repeat: do not replace the existing configuration wholesale.
  • Ensure it's not enabled by default

Process-wise, the fact that this was implemented with TextMatchEditCheck as a pure configuration change has surfaced that we do not yet have a decided process around deployments of this type; see comment section for discussion of how we're going forward with this.

Meta

  • Relevant wish, policy, guideline, template, etc.: en:MOS:ENGVAR
  • Suggestion scope: en.wiki

Requirements

Meta

  • Configuration
    • Account: false
      • Specify which account state the edit check should apply to. Valid values are "loggedin", "loggedout", false. The default false results in the edit check applying to all users.
    • maximumEditcount: 100000
      • Specify a threshold for the number of changes at which Edit Check is activated. The default 100 means that the edit check will only be shown to users with 100 edits or fewer. If this value is not defined, the default value is used. The number of edits is based on user edit count, edits from all namespaces are taken into account.
    • ignoreSectionsminimumEditcount: []
      • An array of section titles, which will be compared case-insensitively to headings. If a heading matches an item in this array, all content within that section will be ignored for checks.
    • ignoreLeadSection: false
      • If true, the content of the lead section will be ignored for checks. A lead section is defined as content in an article with at least one heading that precedes the first heading.
    • Enabled: true
      • If true, the check will be enabled, assuming all other configuration allows it to be shown. If false, the check will not be shown.
    • Type Suggestion
    • inCategory: #TODO
    • hasTemplate: #TODO
      • An array of templates whose presence should cause this check to be offered.
    • lacksTemplate: #TODO
      • An array of templates whose absence should cause this check to be offered.
  • Detection heuristic: #TODO
  • Edit Tag(s): See T413419

User experience

  • Card design
    • Title: British/American English detected
    • Description (≤2 sentences): Consider adjusting this word to American English/British English so that it is consistent with how the rest of the article is written. Learn more.
    • Link to learn more: MOS:ENGVAR
    • Calls to action
      • Switch: When tapped, automatically adjust the word in question to use the British/American English equivalent
        • Note: the above assumes communities will have configured these mappings.
      • Dismiss: When tapped, leave the word in question unchanged
    • Success toast: Thank you for helping to make this article easier for people to read and follow.

Instrumentation
As with all Edit Checks and Suggestions, we will want to know...

  • Any time a Suggestion of this type is activated within an edit session
  • Any time someone views a Suggestion of this type within an edit session
  • Any time someone engages with a Suggestion and how they engage with it

See also

Artifacts

Checklists

Milestones / Review steps
  • Patchdemo
  • Deployed to enwiki for experimental mode
  • Done?
Head of queue
  • File ticket to add toast for TextMatchEditCheck
  • Identify if we can include internal links in TextMatchEditCheck for the "learn more" link; file ticket if not
    • This is somewhat cross-cutting with internationalisation. This is still somewhat in discussion but we're likely to add i18n strings as we move checks out of experimental, so that translators do not get extra unnecessary work before we're ready to go.
Any time
Before deploy
  • Agree a word list with community feedback
  • Agree when to deploy this config change
    • Engineering/PM happy going to go ahead, but not enabled by default
    • Checking in with community liaison first
  • Check TextMatchEditCheck has instrumentation (probably); file ticket if not
    • Yup, we've got the built-in ones for all checks
  • Update description text to match current description
  • Decide if excluding blockquotes etc is blocking (see T415181). Block or file follow up ticket as appropriate.
    • Yes, and
  • Block on ignoreQuotes being available and add ignoreQuotes to the config
Once ready
  • Deploy the config change
  • Add a note to the core template that if you're redirecting a template to it, also add to the edit check

Related Objects

Event Timeline

The exceptions listed in MOS:ENGVAR include quotations, titles of works, and proper names. For quotations, we probably want to apply the same logic to skip them as we're doing for Tone Check. For titles of works, maybe skip anything in italics. And for proper names, maybe skip anything capitalized.

This was doable with TextMatchEditCheck. The words list is by no means great (it didn't have color/colour on it when I got it...), but it works!

Patchdemo here
Configuration file here

zoe changed the task status from Open to In Progress.Jan 6 2026, 4:05 PM

Oh, and I forgot to do this earlier… screenshot!

Screenshot 2026-01-06 at 15.42.58.png (772×1 px, 102 KB)

Configuration file as it stands at the moment (as the patchdemo will go away at some point):

For quotations, we probably want to apply the same logic to skip them as we're doing for Tone Check.

There's no edit-check logic for that currently. The model might possibly cope with it, but that's a black-box.

Update for case sensitivity:

To update in more depth:

  • Internally we tried out the check and found that proper nouns were often suggested for changes.
  • We've chosen to resolve this by turning on case sensitivity, since in English proper nouns tend to be capitalised
  • We also considered whether links to other pages should be matched on the theory that they would tend to be proper nouns, see T414191: Add link filter to TextMatchEditCheck?. A counterexample was identified in chemical elements, and the fact that wikis might use a mix of redirects and link titles to handle the matter. While it might be a useful tool for other checks, we concluded that it's a less useful heuristic than capitalisation for this check.

We also identified that since this check is a pure-configuration change there's not really a clear deployment process. A code change would go through gerrit and the review process, whereas this would be deployed using manual changes to the configuration file on enwiki. We're therefore going to use this as a trial run for the process in T413257: [Meta] Define the deployment path for new Edit Checks/Suggestions.

We discussed that lead sections were likely to have text such as "Colour (or color in American English)" and that we should ignore the lead section as a rough heuristic to prevent this issue.

I've set ignoreLeadSection on the match config field on the patchdemo, but I found that the change was not reflected. I'm filing a ticket to explore why.

The patch to fix ignoreLeadSection has now landed.

Next action:

  • Create a new patchdemo and copy over config and test page
  • Make sure patchdemo link and current config are in the task description as well as in comments

In using another page for the patchdemo I discovered that there's other templates which redirect to "Use British English", in this case "EngvarB".

The following templates link to Template:Use British English:

Template:EngvarB
Template:EB
Template:Eb
Template:Use Scottish English
Template:En-GB
Template:Use European English
Template:Use International English
Template:Use british english
Template:Use British
Template:Use british
Template:Ube
Template:UBE
Template:Engvarb
Template:Use BrE
Template:Use Welsh English
Template:Engvar-B
Template:Use British English spelling
Template:International English
Template:Engvar B
Template:Uken
Template:UKEN
Template:Use British English with -ise spellings
Template:Use British english
Template:Use British spelling
  • Get these templates into the config

I should have written my script with editing the config easily in mind.

cat redirects.json | jq '.query.pages | first | .redirects | map(.title[9:])'

[
  "EngvarB",
  "EB",
  "Eb",
  "Use Scottish English",
  "En-GB",
  "Use European English",
  "Use International English",
  "Use british english",
  "Use British",
  "Use british",
  "Ube",
  "UBE",
  "Engvarb",
  "Use BrE",
  "Use Welsh English",
  "Engvar-B",
  "Use British English spelling",
  "International English",
  "Engvar B",
  "Uken",
  "UKEN",
  "Use British English with -ise spellings",
  "Use British english",
  "Use British spelling"
]

Change #1228501 had a related patch set uploaded (by Zoe; author: Zoe):

[mediawiki/extensions/VisualEditor@master] TextMatchEditCheck: bugfix on ignoreLeadSection

https://gerrit.wikimedia.org/r/1228501

I've found a bug where ignoreLeadSection is still not covered. I think I've fixed it, regenerating a patchdemo...

Change #1228501 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] TextMatchEditCheck: bugfix on ignoreLeadSection

https://gerrit.wikimedia.org/r/1228501

@zoe, if the above list of redirects is hard-coded, does that mean that, if future redirects are created to the British English template, articles tagged with those redirects would not cause the suggestion to activate? That's not optimal, but if it's what we want for the MVP we should probably add a note to the template documentation that, if new redirects are created, they should also do something to ensure this stays in sync.

Good catch, that's right. We could also consider writing a script capable of checking it for us. Probably easily scripted in bash assuming the user has jq installed, or python with native libraries.

PM & engineering are happy to deploy this in experimental mode (ie disabled by default), preferring velocity over perfection.

We'll discuss with community liaison in case there's blockers, and on whether to go with a smaller word list.

The ignoreLeadSection fix landed on Monday, so it should now be fixed in production. This means we can go ahead with production configuration on a technical level; I'll check in on the word list.

Block on ignoreQuotes being available and add ignoreQuotes to the config

ignoreQuotedContent (via patch + patch) merged on the 21st, so it'll be on the deployment train next week, and available on enwiki on the 29th.

It specifically tests whether the start of the range you're considering is within quotes (or is an opening-quote, which isn't technically within, but...), which we might need to expand for other cases, but for this textmatch rule in particular should be Just Fine.

Monday morning update:

  • We're waiting on the train for ignoreQuotes and ignoreLeadSection, so this can't go anywhere until Thursday afternoon.
  • Still deciding about initial word list, as this will be in front of interested volunteers
  • I think the current config is out of sync with the ticket in terms of text, so I'll be checking that

I updated the last version of the config to include ignoreQuotedContent

I've put this up on enwiki via: